Synchronous Influencer Game#

Synchronized Multi-Agent Environment Module#

This module implements a synchronized multi-agent environment for influencer games. The environment simulates a 1D domain where agents can move left, stay, or move right synchronously. Rewards are calculated based on a probability matrix and resource distribution.

Mathematical Definitions:#

The reward for an agent \(i\) is computed as:

\[u_i(x) = \sum_{b \in \mathbb{B}} G_i(x,b) \cdot B(b)\]
where:
  • \(G_i(x,b)\) is the probability of agent \(i\) influencing bin \(b\)

  • \(B(b)\) is the resource available at bin \(b\)

  • \(\mathbb{B}\) is the set of all bin points

The probability is calculated by the function InflGame.MARL.utils.MARL_utils.prob_matrix.

Dependencies:#

  • InflGame.MARL

  • ray.rllib.env.multi_agent_env

Usage:#

The influencer_env_sync class provides a synchronized multi-agent environment for influencer games. It supports custom configurations for agents, resource distributions, and influence kernels.

Example:#

import numpy as np
from InflGame.MARL.sync_game import influencer_env_sync

# Define environment configuration
config = {
    "num_agents": 3,
    "initial_position": [0.2, 0.5, 0.8],
    "bin_points": np.linspace(0, 1, 100),
    "resource_distribution": np.random.rand(100),
    "step_size": 0.01,
    "domain_type": "1d",
    "domain_bounds": [0, 1],
    "infl_configs": {"infl_type": "gaussian"},
    "parameters": [0.1, 0.1, 0.1],
    "fixed_pa": 0,
    "NUM_ITERS": 100
}

# Initialize the environment
env = influencer_env_sync(config=config)

# Reset the environment
observations, _ = env.reset()

# Perform a step
actions = {"player0": env.LEFT, "player1": env.STAY, "player2": env.RIGHT}
observations, rewards, terminated, truncated, info = env.step(actions)

print("Observations:", observations)
print("Rewards:", rewards)
print("Terminated:", terminated)

Classes

class InflGame.MARL.sync_game.influencer_env_sync(config=None)#

A synchronized multi-agent environment for influencer games.

This environment simulates a 1D domain where agents can move left, stay, or move right synchronously. Rewards are calculated based on a probability matrix and resource distribution.

Attributes:
action_space
action_spaces
max_num_agents
np_random

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

np_random_seed

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

num_agents
observation_space
observation_spaces
render_mode
spec
unwrapped

Returns the base non-wrapped environment.

Methods

REWARD_MAP(observations)

Maps the observations to rewards for each agent via the reward dictionary.

close()

After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

get_agent_ids(**kwargs)

get_wrapper_attr(name)

Gets the attribute name from the environment.

has_wrapper_attr(name)

Checks if the attribute name exists in the environment.

initial_position_to_observation()

Convert initial positions to observations.

observation_to_position(observations)

Convert observations to positions in the domain.

observation_update(actions, observations)

Update observations based on actions taken by agents.

render()

Tries to render the environment.

reset(*[, seed, options])

Reset the environment to its initial state.

set_wrapper_attr(name, value, *[, force])

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

step(action_dict)

Perform a single step in the environment.

to_base_env([make_env, num_envs, ...])

Converts an RLlib MultiAgentEnv into a BaseEnv object.

with_agent_groups(groups[, obs_space, act_space])

Convenience method for grouping together agents in this env.

get_action_space

get_observation_space

REWARD_MAP(observations)#

Maps the observations to rewards for each agent via the reward dictionary. The reward for an agent \(i\) is computed as:

\[u_i(x) = \sum_{b \in \mathbb{B}} G_i(x,b) \cdot B(b)\]
where:
  • \(G_i(x,b)\) is the probability of agent \(i\) influencing bin \(b\)

  • \(B(b)\) is the resource available at bin \(b\)

  • \(\mathbb{B}\) is the set of all bin points

The probability is calculated by the function InflGame.MARL.utils.MARL_utils.prob_matrix:

Parameters:

observations (dict) – Current observations of all agents.

Returns:

Rewards for each agent.

Return type:

dict

close()#

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections. Calling close on an already closed environment has no effect and won’t raise an error.

get_wrapper_attr(name)#

Gets the attribute name from the environment.

has_wrapper_attr(name)#

Checks if the attribute name exists in the environment.

initial_position_to_observation()#

Convert initial positions to observations.

Returns:

List of initial observations for all agents.

Return type:

list

property np_random#

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property np_random_seed#

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset or set_np_random_through_seed, the seed will take the value -1.

Returns:

int: the seed of the current np_random or -1, if the seed of the rng is unknown

observation_to_position(observations)#

Convert observations to positions in the domain.

Parameters:

observations (dict) – Current observations of all agents.

Returns:

List of positions corresponding to the observations.

Return type:

list

observation_update(actions, observations)#

Update observations based on actions taken by agents.

The observations are updated based on the actions taken by each agent via the following rules: - If action is LEFT, decrease the observation by 1. - If action is STAY, keep the observation unchanged. - If action is RIGHT, increase the observation by 1. - If the new observation is out of bounds, keep the observation unchanged.

Parameters:
  • actions (dict) – Actions taken by each agent.

  • observations (dict) – Current observations of all agents.

Returns:

Updated observations for all agents.

Return type:

dict

render()#

Tries to render the environment.

reset(*, seed=None, options=None)#

Reset the environment to its initial state.

Parameters:
  • seed (int, optional) – Random seed for reproducibility (optional).

  • options (dict, optional) – Additional options for reset (optional).

Returns:

Initial observations and an empty info dictionary.

Return type:

tuple

set_wrapper_attr(name, value, *, force=True)#

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

step(action_dict)#

Perform a single step in the environment. This method updates the environment based on the actions taken by all agents via the observation_update method. It computes the rewards for each agent using the REWARD_MAP method and checks for termination conditions.

Parameters:

action_dict (dict) – Actions taken by each agent.

Returns:

Updated observations, rewards, termination status, truncation status, and info dictionary.

Return type:

tuple

to_base_env(make_env=None, num_envs=1, remote_envs=False, remote_env_batch_wait_ms=0, restart_failed_sub_environments=False)#

Converts an RLlib MultiAgentEnv into a BaseEnv object.

The resulting BaseEnv is always vectorized (contains n sub-environments) to support batched forward passes, where n may also be 1. BaseEnv also supports async execution via the poll and send_actions methods and thus supports external simulators.

Args:
make_env: A callable taking an int as input (which indicates

the number of individual sub-environments within the final vectorized BaseEnv) and returning one individual sub-environment.

num_envs: The number of sub-environments to create in the

resulting (vectorized) BaseEnv. The already existing env will be one of the num_envs.

remote_envs: Whether each sub-env should be a @ray.remote

actor. You can set this behavior in your config via the remote_worker_envs=True option.

remote_env_batch_wait_ms: The wait time (in ms) to poll remote

sub-environments for, if applicable. Only used if remote_envs is True.

restart_failed_sub_environments: If True and any sub-environment (within

a vectorized env) throws any error during env stepping, we will try to restart the faulty sub-environment. This is done without disturbing the other (still intact) sub-environments.

Returns:

The resulting BaseEnv object.

property unwrapped#

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance

with_agent_groups(groups, obs_space=None, act_space=None)#

Convenience method for grouping together agents in this env.

An agent group is a list of agent IDs that are mapped to a single logical agent. All agents of the group must act at the same time in the environment. The grouped agent exposes Tuple action and observation spaces that are the concatenated action and obs spaces of the individual agents.

The rewards of all the agents in a group are summed. The individual agent rewards are available under the “individual_rewards” key of the group info return.

Agent grouping is required to leverage algorithms such as Q-Mix.

Args:
groups: Mapping from group id to a list of the agent ids

of group members. If an agent id is not present in any group value, it will be left ungrouped. The group id becomes a new agent ID in the final environment.

obs_space: Optional observation space for the grouped

env. Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).

act_space: Optional action space for the grouped env.

Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).

from ray.rllib.env.multi_agent_env import MultiAgentEnv
class MyMultiAgentEnv(MultiAgentEnv):
    # define your env here
    ...
env = MyMultiAgentEnv(...)
grouped_env = env.with_agent_groups(env, {
  "group1": ["agent1", "agent2", "agent3"],
  "group2": ["agent4", "agent5"],
})