General RL Utilities#

Multi-Agent Reinforcement Learning Utilities#

This module provides utility functions for multi-agent reinforcement learning (MARL) in influencer games. It includes functions to compute influence matrices, probability matrices, and influence kernels for agents interacting in a shared environment. The module supports various influence kernels, including Gaussian, Jones, Dirichlet, and Multi-variate Gaussian kernels.

Mathematical Definitions:#

  1. Probability Matrix: The probability matrix \(G\) is defined as:

    \[G_{i,k} = \frac{f_i(x_i, b_k)}{\sum_{j=1}^{N} f_j(x_j, b_k)}\]
    where:
    • \(f_i(x_i, b_k)\) is the influence of agent \(i\) on resource point \(b_k\)

    • \(N\) is the total number of agents

    • \(b_k\) is the `k`th resource point

  2. Influence Matrix: The influence matrix \(I\) is defined as having the components:

    \[\iota_{i,k} = f_i(x_i, b_k)\]
    where:
    • \(f_i(x_i, b_k)\) is the influence of agent \(i\) on resource point \(b_k\)

  3. Influence Kernels: Various influence kernels are supported, including Gaussian, Jones, Dirichlet, and Multi-variate Gaussian kernels.

Dependencies:#

  • InflGame.utils

  • InflGame.kernels

Usage:#

The prob_matrix function computes the probability matrix for agents influencing resource points, while the influence_matrix function calculates the influence matrix for all agents. The influence function computes the influence of a specific agent’s kernel over resource points.

Functions

InflGame.MARL.utils.MARL_utils.influence_matrix_optimized(num_agents, agents_pos, bin_points, infl_configs, parameters, fixed_pa, infl_cshift=False, infl_fshift=False, cshift=None, Q=0.0)#

Compute the influence of a specific agent’s influence kernel over the bin points.

i.e.

\[f_{i}(x_i,b)\]

Where \(x_i\) is the position of the \(i\) th agent and \(b \in \mathbb{B}\) is the resource/bin points in the environment.

There are several types of preset influence kernels, including:

  • Gaussian influence kernel

    \[f_i(x_i,b,\sigma) = e^{-\frac {(x_i-b)^2}{2\sigma^2}}\]
  • Jones influence kernel

    \[f_i(x_i,b,p) = |x-b|^p\]
  • Dirichlet influence kernel

    \[f_i(\mathbb{\alpha},b)=\frac{1}{\beta(\alpha)}\prod_{l=1}^{L} b_l^{(\alpha_l-1)}\]

where \(L\) is the number of dimensions and \(b_l\) is the \(l\) th component of the bin point \(b\).

Here \(\mathbf{\alpha}\) is the parameter vector for the Dirichlet influence kernel, but \(\alpha_\phi\) is the fixed parameter such that

\[\alpha_l=\frac{\alpha_\phi}{x_{(i,\phi)}}*x_{(i,l)}\]

where \(x_{(i,\phi)}\) is the \(\phi\) th component of the position of the \(i\) th agent and \(x_{(i,l)}\) is the the \(l\) th component of the position of the \(i\) th agent.

  • Multi-variate Gaussian influence kernel

    \[f_i(\mathbf{x}_i,\mathbf{b},\Sigma) = e^{-\frac{(\mathbf{x}_i-\mathbf{b})^T \Sigma^{-1} (\mathbf{x}_i-\mathbf{b})}{2}}\]

where \(\Sigma\) is the covariance matrix of the multi-variate Gaussian influence kernel.

  • Custom influence kernel (user-defined)

This influence kernel is defined by the user and can be any function that takes in the agent’s position, bin points, and parameters. Examples of custom influence kernels are provided in the demos.

Parameters:
  • agent_id (int) – The ID of the agent for which influence is being calculated.

  • agents_pos (torch.Tensor | numpy.ndarray) – Positions of the agents.

  • bin_points (torch.Tensor | numpy.ndarray) – Positions of the resource points.

  • infl_configs (dict) – Configuration for the influence type.

  • parameters (list | numpy.ndarray | torch.Tensor) – Parameters for the influence function.

  • alpha_matrix (torch.Tensor, optional) – Alpha parameters for Dirichlet influence. Defaults to 0.

Returns:

A vector representing the influence of the agent over all resource points.

Return type:

torch.Tensor

InflGame.MARL.utils.MARL_utils.observation_to_position(observations, possible_positions)#

Convert observations to positions in the domain.

Parameters:

observations (dict[str, int]) – Current observations of all agents.

Returns:

List of positions corresponding to the observations.

Return type:

list[list[int]]

InflGame.MARL.utils.MARL_utils.positions_list(num_observations, num_agents=2)#

Generates a list of all possible positions for players based on the number of observations.

Parameters:
  • num_observations (int) – The number of observations in the environment.

  • num_agents (int) – The number of players in the environment.

Returns:

A list of all possible positions for the players.

Return type:

list

InflGame.MARL.utils.MARL_utils.possible_observations(possible_agents, num_observations, num_agents)#

Generates all possible observations for the agents.

Parameters:
  • possible_agents (list[str]) – A list of agent identifiers.

  • num_observations (int) – The number of observations in the environment.

  • num_agents (int) – The number of players in the environment.

Returns:

A list of dictionaries representing all possible observations for the agents.

Return type:

list[dict[str, int]]

InflGame.MARL.utils.MARL_utils.prob_matrix(*args, **kwargs)#

Backward compatibility wrapper.

InflGame.MARL.utils.MARL_utils.prob_matrix_optimized(num_agents, agents_pos, bin_points, infl_configs, parameters, fixed_pa, infl_cshift=False, infl_fshift=False, cshift=None, Q=0.0)#

Optimized probability matrix computation using vectorization and caching.

InflGame.MARL.utils.MARL_utils.remove_all_tuples(input_list, times=2)#

Recursively removes tuples from a list a specified number of times.

Parameters:
  • input_list (list) – A list that may contain tuples and integers.

  • times (int) – The number of times to recursively remove tuples.

Returns:

A new list with tuples flattened into integers.

Return type:

list

InflGame.MARL.utils.MARL_utils.remove_tuples(input_list)#

Removes tuples from a list, returning a new list with only integers.

Parameters:

input_list (list) – A list that may contain tuples and integers.

Returns:

A new list with tuples flattened into integers.

Return type:

list

InflGame.MARL.utils.MARL_utils.reward_dict(*args, **kwargs)#

Backward compatibility wrapper.

InflGame.MARL.utils.MARL_utils.reward_dict_optimized(possible_agents, possible_positions, num_observations, num_agents, bin_points, infl_configs, parameters, fixed_pa, infl_fshift, infl_cshift, cshift, Q, resource_distribution, normalize=True)#

Highly optimized reward dictionary computation using batch processing.

InflGame.MARL.utils.MARL_utils.reward_obs(*args, **kwargs)#

Backward compatibility wrapper.

InflGame.MARL.utils.MARL_utils.reward_obs_optimized(observations, possible_agents, possible_positions, num_agents, bin_points, infl_configs, parameters, fixed_pa, infl_fshift, infl_cshift, cshift, Q, resource_distribution)#

Optimized reward computation using vectorized operations.