Asynchronous Influencer Game#
Asynchronous Multi-Agent Environment Module#
This module implements an asynchronous multi-agent environment for influencer games. The environment simulates a 1D domain where agents can move left, stay, or move right asynchronously. Rewards are calculated based on a probability matrix and resource distribution.
Mathematical Definitions:#
The reward for an agent \(i\) is computed as:
- where:
\(G_i(b)\) is the probability of agent \(i\) influencing bin \(b\)
\(B(b)\) is the resource available at bin \(b\)
\(\mathbb{B}\) is the set of all bin points
The probability is calculated by the function InflGame.MARL.utils.MARL_utils.prob_matrix.
Dependencies:#
InflGame.MARL
ray.rllib
Usage:#
The influencer_env_async class provides an asynchronous multi-agent environment for influencer games. It supports custom configurations for agents, resource distributions, and influence kernels.
Example:#
import numpy as np
from InflGame.MARL.async_game import influencer_env_async
# Define environment configuration
config = {
"num_agents": 3,
"initial_position": [0.2, 0.5, 0.8],
"bin_points": np.linspace(0, 1, 100),
"resource_distribution": np.random.rand(100),
"step_size": 0.01,
"domain_type": "1d",
"domain_bounds": [0, 1],
"infl_configs": {"infl_type": "gaussian"},
"parameters": [0.1, 0.1, 0.1],
"fixed_pa": 0,
"NUM_ITERS": 100
}
# Initialize the environment
env = influencer_env_async(config=config)
# Reset the environment
observations, _ = env.reset()
# Perform a step
actions = {"player0": env.LEFT, "player1": env.STAY, "player2": env.RIGHT}
observations, rewards, terminated, truncated, info = env.step(actions, player="player0")
print("Observations:", observations)
print("Rewards:", rewards)
print("Terminated:", terminated)
Classes
- class InflGame.MARL.async_game.influencer_env_async(config=None)#
An asynchronous multi-agent environment for influencer games.
This environment simulates a 1D domain where agents can move left, stay, or move right asynchronously. Rewards are calculated based on a probability matrix and resource distribution.
- Attributes:
- action_space
- action_spaces
- max_num_agents
np_randomReturns the environment’s internal
_np_randomthat if not set will initialise with a random seed.np_random_seedReturns the environment’s internal
_np_random_seedthat if not set will first initialise with a random int as seed.- num_agents
- observation_space
- observation_spaces
- render_mode
- spec
unwrappedReturns the base non-wrapped environment.
Methods
REWARD_MAP(observations)Compute the reward for each agent based on their positions.
close()After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
get_agent_ids(**kwargs)get_wrapper_attr(name)Gets the attribute name from the environment.
has_wrapper_attr(name)Checks if the attribute name exists in the environment.
Convert initial positions to observations.
observation_to_position(observations)Convert observations to positions in the domain.
observation_update(actions, observations, key)Update the observation of a specific agent based on its action.
render()Tries to render the environment.
reset(*[, seed, options])Reset the environment to its initial state.
set_wrapper_attr(name, value, *[, force])Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.
step(action_dict, agent)Perform a single step in the environment.
to_base_env([make_env, num_envs, ...])Converts an RLlib MultiAgentEnv into a BaseEnv object.
with_agent_groups(groups[, obs_space, act_space])Convenience method for grouping together agents in this env.
get_action_space
get_observation_space
- REWARD_MAP(observations)#
Compute the reward for each agent based on their positions.
\[u_i(x) = \sum_{b \in \mathbb{B}} G_i(b) \cdot R(b)\]- where:
\(G_i(b)\) is the probability of agent \(i\) influencing bin \(b\)
\(B(b)\) is the resource available at bin \(b\)
\(\mathbb{B}\) is the set of all bin points
The probability is calculated by the function
InflGame.MARL.utils.MARL_utils.prob_matrix.- Parameters:
observations (dict) – Current observations of all agents.
- Returns:
Rewards for each agent.
- Return type:
dict
- close()#
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections. Calling
closeon an already closed environment has no effect and won’t raise an error.
- get_wrapper_attr(name)#
Gets the attribute name from the environment.
- has_wrapper_attr(name)#
Checks if the attribute name exists in the environment.
- initial_position_to_observation()#
Convert initial positions to observations.
- Returns:
List of observations corresponding to initial positions.
- Return type:
list
- property np_random#
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns:
Instances of
np.random.Generator
- property np_random_seed#
Returns the environment’s internal
_np_random_seedthat if not set will first initialise with a random int as seed.If
np_random_seedwas set directly instead of throughresetorset_np_random_through_seed, the seed will take the value -1.- Returns:
int: the seed of the current
np_randomor -1, if the seed of the rng is unknown
- observation_to_position(observations)#
Convert observations to positions in the domain.
- Parameters:
observations (dict) – Observations of all agents.
- Returns:
List of positions corresponding to the observations.
- Return type:
list
- observation_update(actions, observations, key)#
Update the observation of a specific agent based on its action.
The observations are updated based on the actions taken by the agent via the following rules: - If action is LEFT, decrease the observation by 1. - If action is STAY, keep the observation unchanged. - If action is RIGHT, increase the observation by 1. - If the new observation is out of bounds, keep the observation unchanged.
- Parameters:
actions (dict) – Actions taken by all agents.
observations (dict) – Current observations of all agents.
key (str) – Key of the agent whose observation is being updated.
- Returns:
Updated observations.
- Return type:
dict
- render()#
Tries to render the environment.
- reset(*, seed=None, options=None)#
Reset the environment to its initial state.
- Parameters:
seed (int, optional) – Random seed for reproducibility, defaults to None.
options (dict, optional) – Additional options for reset, defaults to None.
- Returns:
Initial observations and an empty info dictionary.
- Return type:
tuple
- set_wrapper_attr(name, value, *, force=True)#
Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.
- step(action_dict, agent)#
Perform a single step in the environment. This method updates the environment based on the actions taken by all agents via the
observation_updatemethod. It computes the rewards for each agent using theREWARD_MAPmethod and checks for termination conditions.Unlike the step method in synchronous environments
InflGame.MARL.sync_gamewhere all agents take actions at the same time and the environment is updated synchronously, this method processes actions asynchronouslyAdditionally, this method requires the player whose action is being processed.
- Parameters:
action_dict (dict) – Actions taken by all agents.
agent (str) – The player whose action is being processed.
- Returns:
Updated observations, rewards, termination status, truncated status, and info dictionary.
- Return type:
tuple
- to_base_env(make_env=None, num_envs=1, remote_envs=False, remote_env_batch_wait_ms=0, restart_failed_sub_environments=False)#
Converts an RLlib MultiAgentEnv into a BaseEnv object.
The resulting BaseEnv is always vectorized (contains n sub-environments) to support batched forward passes, where n may also be 1. BaseEnv also supports async execution via the poll and send_actions methods and thus supports external simulators.
- Args:
- make_env: A callable taking an int as input (which indicates
the number of individual sub-environments within the final vectorized BaseEnv) and returning one individual sub-environment.
- num_envs: The number of sub-environments to create in the
resulting (vectorized) BaseEnv. The already existing env will be one of the num_envs.
- remote_envs: Whether each sub-env should be a @ray.remote
actor. You can set this behavior in your config via the remote_worker_envs=True option.
- remote_env_batch_wait_ms: The wait time (in ms) to poll remote
sub-environments for, if applicable. Only used if remote_envs is True.
- restart_failed_sub_environments: If True and any sub-environment (within
a vectorized env) throws any error during env stepping, we will try to restart the faulty sub-environment. This is done without disturbing the other (still intact) sub-environments.
- Returns:
The resulting BaseEnv object.
- property unwrapped#
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Envinstance
- with_agent_groups(groups, obs_space=None, act_space=None)#
Convenience method for grouping together agents in this env.
An agent group is a list of agent IDs that are mapped to a single logical agent. All agents of the group must act at the same time in the environment. The grouped agent exposes Tuple action and observation spaces that are the concatenated action and obs spaces of the individual agents.
The rewards of all the agents in a group are summed. The individual agent rewards are available under the “individual_rewards” key of the group info return.
Agent grouping is required to leverage algorithms such as Q-Mix.
- Args:
- groups: Mapping from group id to a list of the agent ids
of group members. If an agent id is not present in any group value, it will be left ungrouped. The group id becomes a new agent ID in the final environment.
- obs_space: Optional observation space for the grouped
env. Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).
- act_space: Optional action space for the grouped env.
Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).
from ray.rllib.env.multi_agent_env import MultiAgentEnv class MyMultiAgentEnv(MultiAgentEnv): # define your env here ... env = MyMultiAgentEnv(...) grouped_env = env.with_agent_groups(env, { "group1": ["agent1", "agent2", "agent3"], "group2": ["agent4", "agent5"], })