IQL Utilities#

IQL Utilities Module#

This module provides utility functions for Independent Q-Learning (IQL) in multi-agent reinforcement learning (MARL). The functions include methods for converting Q-tables to tensors, generating policies from Q-tensors, and adjusting parameters such as epsilon, temperature, and the number of episodes based on various scheduling strategies.

The module supports: - Conversion of Q-tables to tensor representations for efficient computation. - Policy generation using softmax over Q-values. - Dynamic adjustment of epsilon, temperature, and episodes for exploration-exploitation trade-offs. - Scheduling strategies such as cosine annealing and reverse cosine annealing.

Dependencies:#

  • `InflGame.utils

Usage:#

The functions in this module can be used to preprocess Q-tables, generate policies, and dynamically adjust parameters for MARL algorithms.

Example:#

from InflGame.MARL.utils.IQL_utils import Q_table_to_tensor, Q_tensor_to_policy, adjusted_epsilon

# Convert a Q-table to a tensor
Q_table = {
    0: {0: {0: 1.0, 1: 2.0}, 1: {0: 3.0, 1: 4.0}},
    1: {0: {0: 5.0, 1: 6.0}, 1: {0: 7.0, 1: 8.0}}
}
Q_tensor = Q_table_to_tensor(Q_table)

# Generate a policy from the Q-tensor
policy = Q_tensor_to_policy(Q_tensor, temperature=0.5, agent_id=0)

# Adjust epsilon using cosine annealing
configs = {
    'TYPE': 'cosine_annealing',
    'epsilon_min': 0.1,
    'epsilon_max': 1.0
}
epsilon = adjusted_epsilon(configs, num_agents=2, episode=10, episodes=100)
print(f"Adjusted epsilon: {epsilon}")

Functions

InflGame.MARL.utils.IQL_utils.Q_table_to_tensor(Q_table)#

Converts a Q-table (nested dictionary) into a tensor representation.

\[\]

Q_{torch}[i,j,k] = Q_{dict}[p_i,s_i, a_j]

where:

-\(Q(s_i, a_j, a_k)\) is the Q-value for state \(s_i\), actions \(a_j\) , and \(p_i\) the \(i\)-th player.

Parameters:

Q_table (dict) – Nested dictionary representing the Q-table.

Returns:

Tensor representation of the Q-table.

Return type:

torch.Tensor

InflGame.MARL.utils.IQL_utils.Q_tensor_to_policy(q_tensor, temperature=0.5, agent_id=0)#

Converts a Q-tensor into a policy tensor using a softmax function.

\[\pi(a|s) = \]

rac{exp(Q(s, a) / T)}{sum_{a’} exp(Q(s, a’) / T)}

where:
  • \(a\) is the action.

  • \(a'\) is the set of all possible actions.

  • \(s\) is the state.

  • \(Q(s, a)\) is the Q-value for state \(s\) and action \(a\).

  • \(T\) is the temperature parameter.

  • \(\pi(a|s)\) is the policy for action \(a\) given state \(s\).

param torch.Tensor q_tensor:

Q-tensor for all players.

param float temperature:

Temperature parameter for softmax.

param int agent_id:

ID of the player for which to compute the policy.

return:

Policy tensor for the specified player.

rtype:

torch.Tensor

InflGame.MARL.utils.IQL_utils.adjusted_episodes(configs, epoch, epochs)#

Adjusts the number of episodes based on the specified configuration and epoch progress.

Parameters:
  • configs (dict) – Configuration dictionary containing episode adjustment parameters TYPE, episode_max, episode_min. TYPE (str): schedule type episodes_min (int): minimum number of episodes episodes_max (optional,int):maximum number of episodes

  • epoch (int) – Current epoch number.

  • epochs (int) – Total number of epochs.

Returns:

Adjusted number of episodes.

Return type:

int

InflGame.MARL.utils.IQL_utils.adjusted_epsilon(configs, num_agents, episode, episodes)#

Adjusts the epsilon value based on the specified configuration and episode progress.

Epsilon Adjustment Strategies#

Schedule Type - Epsilon Formula

Fixed - \(\epsilon = \epsilon_{ ext{constant}}\)

Cosine Annealing - :math:`epsilon = epsilon_{ ext{min}} +

rac{1}{2} (epsilon_{ ext{max}} - epsilon_{ ext{min}}) (1 + cos( rac{pi cdot ext{episode}}{ ext{episodes}}))`

param dict configs:

Configuration dictionary containing epsilon adjustment parameters.

param int num_agents:

Number of players in the game.

param int episode:

Current episode number.

param int episodes:

Total number of episodes.

return:

Adjusted epsilon value.

rtype:

float

InflGame.MARL.utils.IQL_utils.adjusted_temperature(configs, observation, observation_space_size)#

Adjusts the temperature value based on the specified configuration and observation.

Parameters:
  • configs (dict) – Configuration dictionary containing temperature adjustment parameters.

  • observation (int) – Current observation value.

  • observation_space_size (int) – Size of the observation space.

Returns:

Adjusted temperature value.

Return type:

float

InflGame.MARL.utils.IQL_utils.cosine_annealing_distance_dependent(value_max, value_min, time, time_crit, time_max, max_distance=None)#

Computes a value using cosine annealing based on the distance from a critical time.

\[v(t) = v_{\text{min}} + \frac{1}{2} (v_{\text{max}} - v_{\text{min}}) \left(1 + \cos\left(\frac{\pi \cdot |t - t_{\text{crit}}|}{d_{\text{max}}}\right)\right)\]
where:
  • \(v(t)\) is the computed value at time \(t\).

  • \(v_{\text{max}}\) is the maximum value.

  • \(v_{\text{min}\) is the minimum value.

  • \(t_{\text{crit}}\) is the critical time step.

  • \(d_{\text{max}}\) is the maximum distance from the critical time, defaulting to half of \(t_{\text{max}}\).

This function adjusts the value smoothly using a cosine function, depending on the distance from a critical time step.

Parameters:
  • value_max (float) – Maximum value.

  • value_min (float) – Minimum value.

  • time (int) – Current time step.

  • time_crit (int) – Critical time step.

  • time_max (int) – Maximum time step.

  • max_distance (int) – Maximum distance from the critical time. Defaults to half of time_max.

Returns:

Computed value based on cosine annealing.

Return type:

float

InflGame.MARL.utils.IQL_utils.q_tables_to_q_tensors(num_runs, q_tables)#

Converts multiple Q-tables into a stacked tensor representation.

Parameters:
  • num_runs (int) – Number of runs (Q-tables).

  • q_tables (dict) – Dictionary containing Q-tables for each run.

Returns:

Stacked tensor representation of all Q-tables.

Return type:

torch.Tensor

InflGame.MARL.utils.IQL_utils.reverse_cosine_annealing(value_max, value_min, time, time_max)#

Reverse cosine annealing adjusts a value smoothly over time, starting from the minimum value and increasing toward the maximum value, following a cosine curve.

\[v(t) = v_{ ext{min}} + v_{ ext{max}} - \left\lfloor v_{ ext{min}} + \]
rac{1}{2} (v_{ ext{max}} - v_{ ext{min}})

left(1 + cosleft(

rac{pi cdot t}{t_{ ext{max}}} ight) ight) ight floor

where:
  • \(v(t)\) is the computed value at time \(t\).

  • \(v_{ ext{max}}\) is the maximum value.

  • \(v_{ ext{min}\) is the minimum value.

  • \(t_{ ext{max}\) is the maximum time step.

param float value_max:

Maximum value.

param float value_min:

Minimum value.

param int time:

Current time step.

param int time_max:

Maximum time step.

return:

Computed value based on reverse cosine annealing.

rtype:

float