MARL Plots#
Multi-Agent Reinforcement Learning (MARL) Plotting Module#
This module provides visualization tools for analyzing the performance of multi-agent reinforcement learning (MARL) algorithms. It includes functions for plotting policies, rewards, and positions of agents over time.
Dependencies:#
InflGame.utils
InflGame.MARL
Usage:#
The policy_histogram function visualizes the Q-table as a policy heatmap, while the reward_plot and pos_plot functions plot the rewards and positions of agents over time, respectively. The policy_deterministically_to_actions function simulates deterministic actions for agents based on their policies.
Example:#
import numpy as np
import torch
from InflGame.MARL.async_game import influencer_env_async
from InflGame.MARL.MARL_plots import policy_histogram, reward_plot, pos_plot, policy_deterministically_to_actions
# Define environment configuration
env_config = {
"num_agents": 3,
"initial_position": [0.2, 0.5, 0.8],
"bin_points": np.linspace(0, 1, 100),
"resource_distribution": np.random.rand(100),
"step_size": 0.01,
"domain_type": "1d",
"domain_bounds": [0, 1],
"infl_configs": {"infl_type": "gaussian"},
"parameters": [0.1, 0.1, 0.1],
"fixed_pa": 0,
"NUM_ITERS": 100
}
# Initialize the environment
env = influencer_env_async(config=env_config)
# Simulate deterministic actions
q_tensor = torch.rand((3, 100, 3)) # Example Q-tensor
pos_matrix, reward_matrix = policy_deterministically_to_actions(env=env, q_tensor=q_tensor, num_step=50)
# Plot policy heatmap for player 0
policy_fig = policy_histogram(q_tensor=q_tensor, player_id=0)
policy_fig.show()
# Plot rewards over time
reward_fig = reward_plot(reward_matrix=reward_matrix, possible_agents=env.possible_agents)
reward_fig.show()
# Plot positions over time
pos_fig = pos_plot(pos_matrix=pos_matrix, possible_agents=env.possible_agents, domain_bounds=env_config["domain_bounds"])
pos_fig.show()
Functions
- InflGame.MARL.MARL_plots.policy_deterministically_to_actions(env, q_table=None, q_tensor=None, initial_position=array([0, 1]), num_step=10, temperature=1)#
Simulates deterministic actions for agents based on their policies. By doing the following
1. The Q-table is converted to a policy using a softmax function. i.e.
\[P(a|s) = \frac{e^{Q(s,a)/T}}{\sum_{a'} e^{Q(s,a')/T}}\]- where:
\(a\) is the action
\(s\) is the current state
\(a'\) is the next state
\(T\) is the temperature parameter
\(P(a|s)\) is the probability of taking action \(a\) in state \(s\)
\(Q(s,a)\) is the Q-value for action \(a\) in state \(s\)
The maximum action is selected for each state.
The environment is stepped through the selected actions for a specified number of steps.
The positions and rewards are recorded at each step.
- Parameters:
env (influencer_env_async) – The environment object.
q_table (dict, optional) – Q-table in dictionary format. Defaults to None.
q_tensor (torch.Tensor, optional) – Q-table as a torch.Tensor. Defaults to None.
initial_position (np.ndarray) – Initial position of players. Defaults to np.array([0, 1]).
num_step (int) – Number of steps to simulate. Defaults to 10.
temperature (float) – A smoothness factor for the softmax function. Defaults to 1.
- Returns:
Position matrix and reward matrix as torch.Tensors.
- Return type:
tuple[torch.Tensor, torch.Tensor]
- InflGame.MARL.MARL_plots.policy_histogram(q_table=None, q_tensor=None, agent_id=0, temperature=1)#
Visualizes the Q-table as a policy using a softmax function and plots it as a heatmap.
\[P(a|s) = \frac{e^{Q(s,a)/T}}{\sum_{a'} e^{Q(s,a')/T}}\]- where:
\(a\) is the action
\(s\) is the current state
\(a'\) is the next state
\(T\) is the temperature parameter
\(P(a|s)\) is the probability of taking action \(a\) in state \(s\)
\(Q(s,a)\) is the Q-value for action \(a\) in state \(s\)
- Parameters:
q_table (dict, optional) – Q-table in dictionary format. Defaults to None.
q_tensor (torch.Tensor, optional) – Q-table as a torch.Tensor. Defaults to None.
player_id (int) – Player’s ID number. Defaults to 0.
temperature (float) – A smoothness factor for the softmax function. Defaults to 1.
- Returns:
Figure representing the policy as a heatmap.
- Return type:
matplotlib.figure.Figure
- InflGame.MARL.MARL_plots.pos_plot(pos_matrix, possible_agents, domain_bounds)#
Plots the positions of all players over time.
- Parameters:
pos_matrix (torch.Tensor) – Matrix containing positions for each player at each step.
possible_agents (dict) – Dictionary of possible agents in the environment.
domain_bounds (list) – List containing the lower and upper bounds of the domain.
- Returns:
A figure of the agent positions through time using the optimal policy.
- Return type:
matplotlib.figure.Figure
- InflGame.MARL.MARL_plots.reward_plot(reward_matrix, possible_agents)#
Plots the rewards for all players over time.
- Parameters:
reward_matrix (torch.Tensor) – Matrix containing rewards for each player at each step.
possible_agents (dict) – Dictionary of possible agents in the environment.
- Returns:
A figure of the reward through time using the optimal policy.
- Return type:
matplotlib.figure.Figure