Asynchronous Influencer Game#

Asynchronous Multi-Agent Environment Module#

This module implements an asynchronous multi-agent environment for influencer games. The environment simulates a 1D domain where agents can move left, stay, or move right asynchronously. Rewards are calculated based on a probability matrix and resource distribution.

Mathematical Definitions:#

The reward for an agent \(i\) is computed as:

\[u_i(x) = \sum_{b \in \mathbb{B}} G_i(b) \cdot B(b)\]
where:
  • \(G_i(b)\) is the probability of agent \(i\) influencing bin \(b\)

  • \(B(b)\) is the resource available at bin \(b\)

  • \(\mathbb{B}\) is the set of all bin points

The probability is calculated by the function InflGame.MARL.utils.MARL_utils.prob_matrix.

Dependencies:#

  • InflGame.MARL

  • ray.rllib

Usage:#

The influencer_env_async class provides an asynchronous multi-agent environment for influencer games. It supports custom configurations for agents, resource distributions, and influence kernels.

Example:#

import numpy as np
from InflGame.MARL.async_game import influencer_env_async

# Define environment configuration
config = {
    "num_agents": 3,
    "initial_position": [0.2, 0.5, 0.8],
    "bin_points": np.linspace(0, 1, 100),
    "resource_distribution": np.random.rand(100),
    "step_size": 0.01,
    "domain_type": "1d",
    "domain_bounds": [0, 1],
    "infl_configs": {"infl_type": "gaussian"},
    "parameters": [0.1, 0.1, 0.1],
    "fixed_pa": 0,
    "NUM_ITERS": 100
}

# Initialize the environment
env = influencer_env_async(config=config)

# Reset the environment
observations, _ = env.reset()

# Perform a step
actions = {"player0": env.LEFT, "player1": env.STAY, "player2": env.RIGHT}
observations, rewards, terminated, truncated, info = env.step(actions, player="player0")

print("Observations:", observations)
print("Rewards:", rewards)
print("Terminated:", terminated)

Classes

class InflGame.MARL.async_game.influencer_env_async(config=None)#

An asynchronous multi-agent environment for influencer games.

This environment simulates a 1D domain where agents can move left, stay, or move right asynchronously. Rewards are calculated based on a probability matrix and resource distribution.

Attributes:
action_space
action_spaces
max_num_agents
np_random

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

np_random_seed

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

num_agents
observation_space
observation_spaces
render_mode
spec
unwrapped

Returns the base non-wrapped environment.

Methods

REWARD_MAP(observations)

Compute the reward for each agent based on their positions.

close()

After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

get_agent_ids(**kwargs)

get_wrapper_attr(name)

Gets the attribute name from the environment.

has_wrapper_attr(name)

Checks if the attribute name exists in the environment.

initial_position_to_observation()

Convert initial positions to observations.

observation_to_position(observations)

Convert observations to positions in the domain.

observation_update(actions, observations, key)

Update the observation of a specific agent based on its action.

render()

Tries to render the environment.

reset(*[, seed, options])

Reset the environment to its initial state.

set_wrapper_attr(name, value, *[, force])

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

step(action_dict, agent)

Perform a single step in the environment.

to_base_env([make_env, num_envs, ...])

Converts an RLlib MultiAgentEnv into a BaseEnv object.

with_agent_groups(groups[, obs_space, act_space])

Convenience method for grouping together agents in this env.

get_action_space

get_observation_space

REWARD_MAP(observations)#

Compute the reward for each agent based on their positions.

\[u_i(x) = \sum_{b \in \mathbb{B}} G_i(b) \cdot R(b)\]
where:
  • \(G_i(b)\) is the probability of agent \(i\) influencing bin \(b\)

  • \(B(b)\) is the resource available at bin \(b\)

  • \(\mathbb{B}\) is the set of all bin points

The probability is calculated by the function InflGame.MARL.utils.MARL_utils.prob_matrix.

Parameters:

observations (dict) – Current observations of all agents.

Returns:

Rewards for each agent.

Return type:

dict

close()#

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections. Calling close on an already closed environment has no effect and won’t raise an error.

get_wrapper_attr(name)#

Gets the attribute name from the environment.

has_wrapper_attr(name)#

Checks if the attribute name exists in the environment.

initial_position_to_observation()#

Convert initial positions to observations.

Returns:

List of observations corresponding to initial positions.

Return type:

list

property np_random#

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property np_random_seed#

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset or set_np_random_through_seed, the seed will take the value -1.

Returns:

int: the seed of the current np_random or -1, if the seed of the rng is unknown

observation_to_position(observations)#

Convert observations to positions in the domain.

Parameters:

observations (dict) – Observations of all agents.

Returns:

List of positions corresponding to the observations.

Return type:

list

observation_update(actions, observations, key)#

Update the observation of a specific agent based on its action.

The observations are updated based on the actions taken by the agent via the following rules: - If action is LEFT, decrease the observation by 1. - If action is STAY, keep the observation unchanged. - If action is RIGHT, increase the observation by 1. - If the new observation is out of bounds, keep the observation unchanged.

Parameters:
  • actions (dict) – Actions taken by all agents.

  • observations (dict) – Current observations of all agents.

  • key (str) – Key of the agent whose observation is being updated.

Returns:

Updated observations.

Return type:

dict

render()#

Tries to render the environment.

reset(*, seed=None, options=None)#

Reset the environment to its initial state.

Parameters:
  • seed (int, optional) – Random seed for reproducibility, defaults to None.

  • options (dict, optional) – Additional options for reset, defaults to None.

Returns:

Initial observations and an empty info dictionary.

Return type:

tuple

set_wrapper_attr(name, value, *, force=True)#

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

step(action_dict, agent)#

Perform a single step in the environment. This method updates the environment based on the actions taken by all agents via the observation_update method. It computes the rewards for each agent using the REWARD_MAP method and checks for termination conditions.

Unlike the step method in synchronous environments InflGame.MARL.sync_game where all agents take actions at the same time and the environment is updated synchronously, this method processes actions asynchronously

Additionally, this method requires the player whose action is being processed.

Parameters:
  • action_dict (dict) – Actions taken by all agents.

  • agent (str) – The player whose action is being processed.

Returns:

Updated observations, rewards, termination status, truncated status, and info dictionary.

Return type:

tuple

to_base_env(make_env=None, num_envs=1, remote_envs=False, remote_env_batch_wait_ms=0, restart_failed_sub_environments=False)#

Converts an RLlib MultiAgentEnv into a BaseEnv object.

The resulting BaseEnv is always vectorized (contains n sub-environments) to support batched forward passes, where n may also be 1. BaseEnv also supports async execution via the poll and send_actions methods and thus supports external simulators.

Args:
make_env: A callable taking an int as input (which indicates

the number of individual sub-environments within the final vectorized BaseEnv) and returning one individual sub-environment.

num_envs: The number of sub-environments to create in the

resulting (vectorized) BaseEnv. The already existing env will be one of the num_envs.

remote_envs: Whether each sub-env should be a @ray.remote

actor. You can set this behavior in your config via the remote_worker_envs=True option.

remote_env_batch_wait_ms: The wait time (in ms) to poll remote

sub-environments for, if applicable. Only used if remote_envs is True.

restart_failed_sub_environments: If True and any sub-environment (within

a vectorized env) throws any error during env stepping, we will try to restart the faulty sub-environment. This is done without disturbing the other (still intact) sub-environments.

Returns:

The resulting BaseEnv object.

property unwrapped#

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance

with_agent_groups(groups, obs_space=None, act_space=None)#

Convenience method for grouping together agents in this env.

An agent group is a list of agent IDs that are mapped to a single logical agent. All agents of the group must act at the same time in the environment. The grouped agent exposes Tuple action and observation spaces that are the concatenated action and obs spaces of the individual agents.

The rewards of all the agents in a group are summed. The individual agent rewards are available under the “individual_rewards” key of the group info return.

Agent grouping is required to leverage algorithms such as Q-Mix.

Args:
groups: Mapping from group id to a list of the agent ids

of group members. If an agent id is not present in any group value, it will be left ungrouped. The group id becomes a new agent ID in the final environment.

obs_space: Optional observation space for the grouped

env. Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).

act_space: Optional action space for the grouped env.

Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).

from ray.rllib.env.multi_agent_env import MultiAgentEnv
class MyMultiAgentEnv(MultiAgentEnv):
    # define your env here
    ...
env = MyMultiAgentEnv(...)
grouped_env = env.with_agent_groups(env, {
  "group1": ["agent1", "agent2", "agent3"],
  "group2": ["agent4", "agent5"],
})