Synchronous Influencer Game#
Synchronized Multi-Agent Environment Module#
This module implements a synchronized multi-agent environment for influencer games. The environment simulates a 1D domain where agents can move left, stay, or move right synchronously. Rewards are calculated based on a probability matrix and resource distribution.
Mathematical Definitions:#
The reward for an agent \(i\) is computed as:
- where:
\(G_i(x,b)\) is the probability of agent \(i\) influencing bin \(b\)
\(B(b)\) is the resource available at bin \(b\)
\(\mathbb{B}\) is the set of all bin points
The probability is calculated by the function InflGame.MARL.utils.MARL_utils.prob_matrix.
Dependencies:#
InflGame.MARL
ray.rllib.env.multi_agent_env
Usage:#
The influencer_env_sync class provides a synchronized multi-agent environment for influencer games. It supports custom configurations for agents, resource distributions, and influence kernels.
Example:#
import numpy as np
from InflGame.MARL.sync_game import influencer_env_sync
# Define environment configuration
config = {
"num_agents": 3,
"initial_position": [0.2, 0.5, 0.8],
"bin_points": np.linspace(0, 1, 100),
"resource_distribution": np.random.rand(100),
"step_size": 0.01,
"domain_type": "1d",
"domain_bounds": [0, 1],
"infl_configs": {"infl_type": "gaussian"},
"parameters": [0.1, 0.1, 0.1],
"fixed_pa": 0,
"NUM_ITERS": 100
}
# Initialize the environment
env = influencer_env_sync(config=config)
# Reset the environment
observations, _ = env.reset()
# Perform a step
actions = {"player0": env.LEFT, "player1": env.STAY, "player2": env.RIGHT}
observations, rewards, terminated, truncated, info = env.step(actions)
print("Observations:", observations)
print("Rewards:", rewards)
print("Terminated:", terminated)
Classes
- class InflGame.MARL.sync_game.influencer_env_sync(config=None)#
A synchronized multi-agent environment for influencer games.
This environment simulates a 1D domain where agents can move left, stay, or move right synchronously. Rewards are calculated based on a probability matrix and resource distribution.
- Attributes:
- action_space
- action_spaces
- max_num_agents
np_randomReturns the environment’s internal
_np_randomthat if not set will initialise with a random seed.np_random_seedReturns the environment’s internal
_np_random_seedthat if not set will first initialise with a random int as seed.- num_agents
- observation_space
- observation_spaces
- render_mode
- spec
unwrappedReturns the base non-wrapped environment.
Methods
REWARD_MAP(observations)Maps the observations to rewards for each agent via the reward dictionary.
close()After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
get_agent_ids(**kwargs)get_wrapper_attr(name)Gets the attribute name from the environment.
has_wrapper_attr(name)Checks if the attribute name exists in the environment.
Convert initial positions to observations.
observation_to_position(observations)Convert observations to positions in the domain.
observation_update(actions, observations)Update observations based on actions taken by agents.
render()Tries to render the environment.
reset(*[, seed, options])Reset the environment to its initial state.
set_wrapper_attr(name, value, *[, force])Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.
step(action_dict)Perform a single step in the environment.
to_base_env([make_env, num_envs, ...])Converts an RLlib MultiAgentEnv into a BaseEnv object.
with_agent_groups(groups[, obs_space, act_space])Convenience method for grouping together agents in this env.
get_action_space
get_observation_space
- REWARD_MAP(observations)#
Maps the observations to rewards for each agent via the reward dictionary. The reward for an agent \(i\) is computed as:
\[u_i(x) = \sum_{b \in \mathbb{B}} G_i(x,b) \cdot B(b)\]- where:
\(G_i(x,b)\) is the probability of agent \(i\) influencing bin \(b\)
\(B(b)\) is the resource available at bin \(b\)
\(\mathbb{B}\) is the set of all bin points
The probability is calculated by the function
InflGame.MARL.utils.MARL_utils.prob_matrix:- Parameters:
observations (dict) – Current observations of all agents.
- Returns:
Rewards for each agent.
- Return type:
dict
- close()#
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections. Calling
closeon an already closed environment has no effect and won’t raise an error.
- get_wrapper_attr(name)#
Gets the attribute name from the environment.
- has_wrapper_attr(name)#
Checks if the attribute name exists in the environment.
- initial_position_to_observation()#
Convert initial positions to observations.
- Returns:
List of initial observations for all agents.
- Return type:
list
- property np_random#
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns:
Instances of
np.random.Generator
- property np_random_seed#
Returns the environment’s internal
_np_random_seedthat if not set will first initialise with a random int as seed.If
np_random_seedwas set directly instead of throughresetorset_np_random_through_seed, the seed will take the value -1.- Returns:
int: the seed of the current
np_randomor -1, if the seed of the rng is unknown
- observation_to_position(observations)#
Convert observations to positions in the domain.
- Parameters:
observations (dict) – Current observations of all agents.
- Returns:
List of positions corresponding to the observations.
- Return type:
list
- observation_update(actions, observations)#
Update observations based on actions taken by agents.
The observations are updated based on the actions taken by each agent via the following rules: - If action is LEFT, decrease the observation by 1. - If action is STAY, keep the observation unchanged. - If action is RIGHT, increase the observation by 1. - If the new observation is out of bounds, keep the observation unchanged.
- Parameters:
actions (dict) – Actions taken by each agent.
observations (dict) – Current observations of all agents.
- Returns:
Updated observations for all agents.
- Return type:
dict
- render()#
Tries to render the environment.
- reset(*, seed=None, options=None)#
Reset the environment to its initial state.
- Parameters:
seed (int, optional) – Random seed for reproducibility (optional).
options (dict, optional) – Additional options for reset (optional).
- Returns:
Initial observations and an empty info dictionary.
- Return type:
tuple
- set_wrapper_attr(name, value, *, force=True)#
Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.
- step(action_dict)#
Perform a single step in the environment. This method updates the environment based on the actions taken by all agents via the
observation_updatemethod. It computes the rewards for each agent using theREWARD_MAPmethod and checks for termination conditions.- Parameters:
action_dict (dict) – Actions taken by each agent.
- Returns:
Updated observations, rewards, termination status, truncation status, and info dictionary.
- Return type:
tuple
- to_base_env(make_env=None, num_envs=1, remote_envs=False, remote_env_batch_wait_ms=0, restart_failed_sub_environments=False)#
Converts an RLlib MultiAgentEnv into a BaseEnv object.
The resulting BaseEnv is always vectorized (contains n sub-environments) to support batched forward passes, where n may also be 1. BaseEnv also supports async execution via the poll and send_actions methods and thus supports external simulators.
- Args:
- make_env: A callable taking an int as input (which indicates
the number of individual sub-environments within the final vectorized BaseEnv) and returning one individual sub-environment.
- num_envs: The number of sub-environments to create in the
resulting (vectorized) BaseEnv. The already existing env will be one of the num_envs.
- remote_envs: Whether each sub-env should be a @ray.remote
actor. You can set this behavior in your config via the remote_worker_envs=True option.
- remote_env_batch_wait_ms: The wait time (in ms) to poll remote
sub-environments for, if applicable. Only used if remote_envs is True.
- restart_failed_sub_environments: If True and any sub-environment (within
a vectorized env) throws any error during env stepping, we will try to restart the faulty sub-environment. This is done without disturbing the other (still intact) sub-environments.
- Returns:
The resulting BaseEnv object.
- property unwrapped#
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Envinstance
- with_agent_groups(groups, obs_space=None, act_space=None)#
Convenience method for grouping together agents in this env.
An agent group is a list of agent IDs that are mapped to a single logical agent. All agents of the group must act at the same time in the environment. The grouped agent exposes Tuple action and observation spaces that are the concatenated action and obs spaces of the individual agents.
The rewards of all the agents in a group are summed. The individual agent rewards are available under the “individual_rewards” key of the group info return.
Agent grouping is required to leverage algorithms such as Q-Mix.
- Args:
- groups: Mapping from group id to a list of the agent ids
of group members. If an agent id is not present in any group value, it will be left ungrouped. The group id becomes a new agent ID in the final environment.
- obs_space: Optional observation space for the grouped
env. Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).
- act_space: Optional action space for the grouped env.
Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).
from ray.rllib.env.multi_agent_env import MultiAgentEnv class MyMultiAgentEnv(MultiAgentEnv): # define your env here ... env = MyMultiAgentEnv(...) grouped_env = env.with_agent_groups(env, { "group1": ["agent1", "agent2", "agent3"], "group2": ["agent4", "agent5"], })