MARL Plots#

Multi-Agent Reinforcement Learning (MARL) Plotting Module#

This module provides visualization tools for analyzing the performance of multi-agent reinforcement learning (MARL) algorithms. It includes functions for plotting policies, rewards, and positions of agents over time.

Dependencies:#

InflGame.utils
InflGame.MARL

Usage:#

The policy_histogram function visualizes the Q-table as a policy heatmap, while the reward_plot and pos_plot functions plot the rewards and positions of agents over time, respectively. The policy_deterministically_to_actions function simulates deterministic actions for agents based on their policies.

Example:#

import numpy as np
import torch
from InflGame.MARL.async_game import influencer_env_async
from InflGame.MARL.MARL_plots import policy_histogram, reward_plot, pos_plot, policy_deterministically_to_actions

# Define environment configuration
env_config = {
    "num_agents": 3,
    "initial_position": [0.2, 0.5, 0.8],
    "bin_points": np.linspace(0, 1, 100),
    "resource_distribution": np.random.rand(100),
    "step_size": 0.01,
    "domain_type": "1d",
    "domain_bounds": [0, 1],
    "infl_configs": {"infl_type": "gaussian"},
    "parameters": [0.1, 0.1, 0.1],
    "fixed_pa": 0,
    "NUM_ITERS": 100
}

# Initialize the environment
env = influencer_env_async(config=env_config)

# Simulate deterministic actions
q_tensor = torch.rand((3, 100, 3))  # Example Q-tensor
pos_matrix, reward_matrix = policy_deterministically_to_actions(env=env, q_tensor=q_tensor, num_step=50)

# Plot policy heatmap for player 0
policy_fig = policy_histogram(q_tensor=q_tensor, player_id=0)
policy_fig.show()

# Plot rewards over time
reward_fig = reward_plot(reward_matrix=reward_matrix, possible_agents=env.possible_agents)
reward_fig.show()

# Plot positions over time
pos_fig = pos_plot(pos_matrix=pos_matrix, possible_agents=env.possible_agents, domain_bounds=env_config["domain_bounds"])
pos_fig.show()

Functions

InflGame.MARL.MARL_plots.policy_deterministically_to_actions(env, q_table=None, q_tensor=None, initial_position=array([0, 1]), num_step=10, temperature=1)#

Simulates deterministic actions for agents based on their policies. By doing the following

1. The Q-table is converted to a policy using a softmax function. i.e.

\[P(a|s) = \frac{e^{Q(s,a)/T}}{\sum_{a'} e^{Q(s,a')/T}}\]

where:

\(a\) is the action
\(s\) is the current state
\(a'\) is the next state
\(T\) is the temperature parameter
\(P(a|s)\) is the probability of taking action \(a\) in state \(s\)
\(Q(s,a)\) is the Q-value for action \(a\) in state \(s\)

The maximum action is selected for each state.
The environment is stepped through the selected actions for a specified number of steps.
The positions and rewards are recorded at each step.

Parameters:

env (influencer_env_async) – The environment object.
q_table (dict, optional) – Q-table in dictionary format. Defaults to None.
q_tensor (torch.Tensor, optional) – Q-table as a torch.Tensor. Defaults to None.
initial_position (np.ndarray) – Initial position of players. Defaults to np.array([0, 1]).
num_step (int) – Number of steps to simulate. Defaults to 10.
temperature (float) – A smoothness factor for the softmax function. Defaults to 1.

Returns:

Position matrix and reward matrix as torch.Tensors.

Return type:

tuple[torch.Tensor, torch.Tensor]

InflGame.MARL.MARL_plots.policy_histogram(q_table=None, q_tensor=None, agent_id=0, temperature=1)#

Visualizes the Q-table as a policy using a softmax function and plots it as a heatmap.

\[P(a|s) = \frac{e^{Q(s,a)/T}}{\sum_{a'} e^{Q(s,a')/T}}\]

where:

\(a\) is the action
\(s\) is the current state
\(a'\) is the next state
\(T\) is the temperature parameter
\(P(a|s)\) is the probability of taking action \(a\) in state \(s\)
\(Q(s,a)\) is the Q-value for action \(a\) in state \(s\)

Parameters:

q_table (dict, optional) – Q-table in dictionary format. Defaults to None.
q_tensor (torch.Tensor, optional) – Q-table as a torch.Tensor. Defaults to None.
player_id (int) – Player’s ID number. Defaults to 0.
temperature (float) – A smoothness factor for the softmax function. Defaults to 1.

Returns:

Figure representing the policy as a heatmap.

Return type:

matplotlib.figure.Figure

InflGame.MARL.MARL_plots.pos_plot(pos_matrix, possible_agents, domain_bounds)#

Plots the positions of all players over time.

Parameters:

pos_matrix (torch.Tensor) – Matrix containing positions for each player at each step.
possible_agents (dict) – Dictionary of possible agents in the environment.
domain_bounds (list) – List containing the lower and upper bounds of the domain.

Returns:

A figure of the agent positions through time using the optimal policy.

Return type:

matplotlib.figure.Figure

InflGame.MARL.MARL_plots.reward_plot(reward_matrix, possible_agents)#

Plots the rewards for all players over time.

Parameters:

reward_matrix (torch.Tensor) – Matrix containing rewards for each player at each step.
possible_agents (dict) – Dictionary of possible agents in the environment.

Returns:

A figure of the reward through time using the optimal policy.

Return type:

matplotlib.figure.Figure

MARL Plots#

Multi-Agent Reinforcement Learning (MARL) Plotting Module#

Dependencies:#

Usage:#

Example:#

This Page