Data Management#

Data Management Utilities Module#

This module provides comprehensive utility functions for managing and organizing data in influencer games research. It handles Q-table loading, data parameter extraction, hierarchical directory structure creation, and standardized file naming conventions for saving and retrieving experimental results.

The module supports multiple data types including Q-tables, configuration files, reward matrices, position data, and mean absolute deviation (MAD) metrics. It automatically creates organized directory hierarchies based on experiment parameters such as number of agents, influence reach, resource types, and state discretization.

Dependencies:#

  • numpy: Array operations

  • hickle: HDF5-based serialization for Python objects

  • pathlib: Object-oriented filesystem paths

  • typing: Type hints support

Key Functions:#

  • q_table_data_load: Load Q-tables and configurations from standardized paths

  • data_parameters: Extract and format data parameters from configuration dictionaries

  • data_directory: Create hierarchical directory structures for data organization

  • data_name: Generate standardized file names based on experiment parameters

  • data_final_name: Combine directory paths and file names for complete file paths

Usage:#

The typical workflow involves defining experiment options, loading existing data, extracting parameters, and generating standardized paths for saving new results. The module enforces consistent naming conventions across all influencer games experiments.

Example:#

from InflGame.utils.data_management import q_table_data_load, data_final_name
import hickle as hkl

# Load existing Q-tables and configurations
options = {
    "agents": 3,
    "reach": "small",
    "modes": 2,
    "density": True
}
q_table, configs = q_table_data_load(options=options)
print(f"Q-table shape: {q_table.shape}")

# Generate standardized file paths for saving new data
data_params = {
    "num_agents": "3_agents",
    "data_type": "q_tables",
    "reach": "sig_50",
    "resource_type": "gaussian",
    "steps": "100_states"
}
file_paths = data_final_name(
    data_parameters=data_params,
    name_ads=["experiment1", "trial1"],
    save_types=[".hkl", ".npy"]
)

# Save data using generated paths
hkl.dump(q_table, file_paths[0])

Functions

InflGame.utils.data_management.data_directory(data_parameters, alt_name, paper_figure=False)#

Create hierarchical directory structure for organized data storage.

This function builds a nested directory hierarchy based on experiment parameters, automatically creating all necessary parent directories. It supports different organizational schemes for research data, plots, and publication-ready figures.

Parameters:
data_parametersDict[str, str]

Dictionary containing organizational parameters. Required keys vary by data_type:

  • For plots: 'data_type', 'section', 'figure_id' (if paper_figure=True)

  • For data: 'data_type', 'num_agents', 'reach', 'resource_type', 'steps'

alt_namebool

Whether to use alternative naming scheme (currently unused, reserved for future use)

paper_figurebool, optional

If True, creates directory structure for publication figures organized by section and figure ID. Default is False.

Returns:
str

Absolute path to the created directory with Windows path separators (\)

Notes

Directory structure patterns:

For paper figures (paper_figure=True):

{module_path}/paper_plots/{section}/{figure_id}/

For regular plots (data_type='plot'):

{module_path}/plots/{domain_type}/{param1}/{param2}/...

For data files:

{module_path}/data/{num_agents}/{param1}/{param2}/...

All intermediate directories are created automatically using pathlib.Path.mkdir(exist_ok=True).

Examples

Create directory for paper figure:

>>> params = {
...     'data_type': 'plot',
...     'section': 'results',
...     'figure_id': 'fig_1'
... }
>>> path = data_directory(params, alt_name=False, paper_figure=True)
>>> print(path)
C:\...\paper_plots\results\fig_1

Create directory for Q-table data:

>>> params = {
...     'data_type': 'q_tables',
...     'num_agents': '3_agents',
...     'reach': 'sig_50',
...     'resource_type': 'gaussian'
... }
>>> path = data_directory(params, alt_name=False)
>>> print(path)
C:\...\data\3_agents\sig_50\gaussian
InflGame.utils.data_management.data_final_name(data_parameters, name_ads, save_types=['.hkl'], paper_figure=False)#

Generate complete file paths combining directory structure and file names.

This function is the primary interface for generating standardized file paths in the influencer games framework. It combines directory creation (via data_directory) and file naming (via data_name) into complete absolute paths ready for saving or loading data.

Parameters:
data_parametersDict[str, str]

Dictionary containing all necessary parameters for path construction. Required keys depend on data_type (see data_directory and data_name for specific requirements).

name_adsList[str]

List of additional descriptive components to append to file names. Useful for experiment versioning, trial IDs, or custom identifiers.

save_typesList[str], optional

List of file extensions including dots. Default is ['.hkl'] (hickle format). Common options: ['.hkl', '.npy', '.pkl', '.png', '.svg']

paper_figurebool, optional

If True, generates paths for publication-ready figures with special directory organization. Default is False.

Returns:
List[str]

List of complete absolute file paths, one for each save type. Paths use Windows separators (\) and include all directory components.

Notes

This function ensures all necessary directories exist before returning paths. The directory creation is handled internally by data_directory.

Path structure follows:

{base_dir}/{param1}/{param2}/.../{base_name}_{ad1}_{ad2}{ext}

Examples

Generate paths for Q-table storage:

>>> params = {
...     'num_agents': '3_agents',
...     'data_type': 'q_tables',
...     'reach': 'sig_50',
...     'resource_type': 'gaussian',
...     'steps': '100_states'
... }
>>> paths = data_final_name(params, name_ads=['exp1'], save_types=['.hkl'])
>>> print(paths[0])
C:\...\data\3_agents\sig_50\gaussian\100_states\q_table_exp1.hkl

Generate multiple format paths for plots:

>>> params = {
...     'data_type': 'plot',
...     'plot_type': 'bifurcation',
...     'domain_type': '1d',
...     'num_agents': '3'
... }
>>> paths = data_final_name(params, name_ads=['trial1'], 
...                         save_types=['.png', '.svg'])
>>> len(paths)
2
InflGame.utils.data_management.data_name(data_parameters, name_ads, save_types, paper_figure=False)#

Generate standardized file names based on data type and parameters.

This function creates descriptive file names following consistent naming conventions for different data types. It supports multiple file formats and allows appending custom suffixes for experiment versioning and identification.

Parameters:
data_parametersDict[str, str]

Dictionary containing data parameters. Required keys:

  • 'data_type'str

    Type of data ('q_tables', 'configs', 'plot', etc.)

For plots, additional keys:

  • 'plot_type'str

    Type of plot visualization

  • 'domain_type'str

    Domain type ('1d', '2d', 'simplex')

  • 'num_agents'str

    Number of agents (if paper_figure=True)

name_adsList[str]

List of additional name components to append (e.g., experiment IDs, trial numbers). Components are joined with underscores.

save_typesList[str]

List of file extensions including the dot (e.g., ['.hkl', '.npy', '.png'])

paper_figurebool, optional

If True, uses publication naming format for plots. Default is False.

Returns:
List[str]

List of complete file names, one for each save type. Each name combines the base name, additional components, and file extension.

Raises:
ValueError

If data_type is not recognized

Notes

Base name mapping by data type:

  • 'q_tables''q_table'

  • 'configs''configs'

  • 'reward_matrix''reward_matrix'

  • 'mean_positions''mean_positions'

  • 'MAD''MAD'

  • 'final_positions''final_positions'

  • 'final_mad''final_mad'

  • 'plot' → custom format based on plot parameters

For paper figures, plot names follow: {domain_type}_{plot_type}_{num_agents}_agents

Examples

Generate Q-table file names with multiple formats:

>>> params = {'data_type': 'q_tables'}
>>> names = data_name(params, name_ads=['exp1', 'v2'], save_types=['.hkl', '.npy'])
>>> print(names)
['q_table_exp1_v2.hkl', 'q_table_exp1_v2.npy']

Generate paper figure name:

>>> params = {
...     'data_type': 'plot',
...     'domain_type': '2d',
...     'plot_type': 'bifurcation',
...     'num_agents': '3'
... }
>>> names = data_name(params, name_ads=[], save_types=['.png'], paper_figure=True)
>>> print(names)
['2d_bifurcation_3_agents.png']
InflGame.utils.data_management.data_parameters(configs, data_type, resource_type)#

Extract and format data parameters from configuration dictionary.

This function parses experiment configurations and extracts key parameters including agent count, influence reach, resource type, and state discretization. It formats these parameters into a standardized dictionary suitable for file naming and directory structure generation.

Parameters:
configsDict[str, dict]

Configuration dictionary containing experiment parameters. Must have an 'env_config_main' key with nested parameters including:

  • 'num_agents'int

    Number of agents in the system

  • 'parameters'list or array

    Influence parameters (first element used for reach)

  • 'step_size'float

    State discretization step size

data_typestr

Type of data being processed. Supported values:

  • 'q_tables' : Q-learning tables

  • 'configs' : Configuration files

  • 'final_mad' : Final mean absolute deviation

  • 'final_positions' : Final agent positions

resource_typestr

Type of resource distribution (e.g., 'gaussian', 'uniform', 'beta')

Returns:
Optional[Dict[str, str]]

Dictionary containing formatted parameters with keys:

  • 'num_agents'str

    Formatted as '{N}_agents'

  • 'data_type'str

    The input data type

  • 'reach'str

    Formatted as 'sig_{value}' where value is \(100 \times \sigma\)

  • 'resource_type'str

    The input resource type

  • 'steps'str

    Number of discrete states, formatted as '{N}_states'

Returns None if data_type is not in supported types.

Notes

The reach parameter is computed as:

\[\text{reach} = \lfloor 100 \times \sigma \rfloor\]

where \(\sigma\) is the first element of configs['env_config_main']['parameters'].

The number of states is computed as:

\[\text{states} = \lfloor 1 / \text{step_size} \rfloor\]

Examples

Extract parameters from a standard configuration:

>>> configs = {
...     'env_config_main': {
...         'num_agents': 3,
...         'parameters': [0.5, 0.3],
...         'step_size': 0.01
...     }
... }
>>> params = data_parameters(configs, 'q_tables', 'gaussian')
>>> print(params)
{'num_agents': '3_agents', 'data_type': 'q_tables', 'reach': 'sig_50', 
 'resource_type': 'gaussian', 'steps': '100_states'}
InflGame.utils.data_management.q_table_data_load(options)#

Load Q-table and configuration data from standardized file paths.

This function constructs file paths based on experiment options and loads pre-computed Q-tables and configuration dictionaries from HDF5-based hickle files. The path structure follows the convention: data/{agents}/{folder}/q_tables.hkl where folder is constructed from agent count, reach parameter, modes, and density.

Parameters:
optionsDict[str, Union[str, int, bool]]

Dictionary containing experiment configuration with required keys:

  • agentsint

    Number of agents in the multi-agent system

  • reachstr

    Influence reach parameter, either 'small' or 'large'

  • modesint

    Number of operational modes in the environment

  • densitybool

    Whether the resource distribution is dense

Returns:
Tuple[dict, dict]

A tuple containing:

  • q_tabledict

    Loaded Q-table data structure mapping states to action values

  • configsdict

    Configuration dictionary containing environment parameters

Raises:
FileNotFoundError

If Q-table or configuration files do not exist at constructed paths

ValueError

If reach parameter is not 'small' or 'large'

Notes

The function maps reach values to sigma parameters:

  • 'small''small_sigma'

  • 'large''large_sigma'

File naming convention follows the pattern: {agents}_agents_{sigma}_{modes}m_{density}/q_tables.hkl

Examples

Load Q-tables for a 3-agent system with small reach:

>>> options = {
...     "agents": 3,
...     "reach": "small",
...     "modes": 2,
...     "density": True
... }
>>> q_table, configs = q_table_data_load(options=options)
>>> print(f"Loaded Q-table type: {type(q_table)}")
Loaded Q-table type: <class 'dict'>

Load data for large reach parameter:

>>> options = {"agents": 5, "reach": "large", "modes": 3, "density": False}
>>> q_table, configs = q_table_data_load(options)