MARL experiments#

Reinforcement Learning Experiments Module#

This module contains functions and utilities for running reinforcement learning experiments in the influencer games framework. It supports both synchronous and asynchronous environments and provides functionality for training and saving Q-tables.

Dependencies:#

  • InflGame.MARL

  • InflGame.utils

Usage:#

The run_experiment function is the main entry point for running reinforcement learning experiments. It supports both synchronous and asynchronous environments and allows for customization of learning parameters, scheduling configurations, and saving results.

Example:#

import numpy as np
from InflGame.MARL.utils.experiments import run_experiment

# Define environment configurations
env_configs = {
    "num_agents": 3,
    "domain_type": "1d",
    "domain_bounds": [0, 1],
    "resource_distribution": "gaussian",
    "resource_parameters": [0.5, 0.1]
}

# Run a synchronous experiment
q_tensor, q_mean = run_experiment(
    action_type="sync",
    env_configs=env_configs,
    trials=10,
    gamma=0.9,
    alpha=0.01,
    epochs=1000,
    random_seed=42,
    smoothing=True,
    description="Synchronous RL experiment",
    name_ads=["sync_test"]
)

print("Experiment completed. Q-tensor and Q-mean saved.")

Functions

InflGame.MARL.utils.experiments.run_experiment(action_type='sync', env_configs=None, trials=100, gamma=0.3, alpha=0.005, epochs=5000, random_seed=0, random_initialization=False, smoothing=True, temperature_configs=None, epsilon_configs=None, episode_configs=None, resource_name='gauss_mix_2m', description='Test trials', algo_epoch=True, checkpoints=False, save_positions=False, name_ads=[])#

Runs a reinforcement learning experiment using the Influencer Games framework and an independent Q-learning algorithm.

Parameters:
  • action_type (str) – Type of environment to use (“sync” or “async”).

  • env_configs (dict) – Configuration dictionary for the environment.

  • trials (int) – Number of trials to run.

  • gamma (float) – Discount factor for the Q-learning algorithm.

  • alpha (float) – Learning rate for the Q-learning algorithm.

  • epochs (int) – Number of epochs for training.

  • random_seed (int) – Seed for random number generation.

  • random_initialization (bool) – Whether to use random initialization for Q-tables.

  • smoothing (bool) – Whether to apply softmax smoothing during training via temperature.

  • temperature_configs (dict, optional) – Configuration for temperature scheduling. - TYPE (str): Type of schedule, e.g., ‘fixed’, ‘cosine_annealing_distance’, ‘cosine_annealing_distance_segmented’. - temperature (float, optional): If TYPE == ‘fixed’, temperature for smoothing. - temperature_max (float, optional): If TYPE != ‘fixed’, maximum global temperature. - temperature_min (float, optional): If TYPE != ‘fixed’, minimum global temperature. - temperature_local_max (float, optional): If TYPE == ‘cosine_annealing_distance_segmented’, minimum for the first segment of the schedule. - temperature_local_min (float, optional): If TYPE == ‘cosine_annealing_distance_segmented’, maximum for the second segment of the schedule.

  • epsilon_configs (dict, optional) – Configuration for epsilon annealing. - TYPE (str): Type of schedule, e.g., ‘fixed’, ‘cosine_annealing’. - epsilon (float, optional): If TYPE == ‘fixed’, epsilon value. - epsilon_max (float, optional): If TYPE != ‘fixed’, maximum epsilon value. - epsilon_min (float, optional): If TYPE != ‘fixed’, minimum epsilon value.

  • episode_configs (dict, optional) – Configuration for episode scheduling. - TYPE (str): Type of schedule, e.g., ‘fixed’, ‘reverse_cosine_annealing’. - episode_max (float): If TYPE == ‘fixed’, max number of episodes in an epoch. - episode_min (float, optional): If TYPE == ‘reverse_cosine_annealing’, global minimum number of episodes in an epoch.

  • description (str, optional) – Description of the experiment.

  • name_ads (list[str], optional) – Additional identifiers for naming saved files.

Returns:

None. Saves Q-tables and configurations to disk. If trials >= 2, also returns Q-tensor and Q-mean tensor.

Return type:

None