Applicability of Algorithmic Information Ratio (AIR) in competitive multi-agent reinforcement learning

Investigate whether the Algorithmic Information Ratio (AIR) framework can be applied to competitive multi-agent reinforcement learning settings, specifically two-player zero-sum and multi-player general-sum Markov games, and determine corresponding regret guarantees and algorithmic formulations under AIR in these environments.

Background

The paper develops Information-Directed Sampling (IDS) algorithms for multi-agent reinforcement learning (MARL) and discusses related concepts. Algorithmic Information Ratio (AIR) is a recent framework for analyzing frequentist regret using Bayesian-type algorithms, extending the information ratio by incorporating alignment to a reference distribution.

While AIR has been studied for bandit and single-agent reinforcement learning problems, its use in competitive MARL (e.g., Markov games) has not been established. Clarifying whether AIR can be effectively formulated and analyzed in multi-agent settings would bridge methodological gaps between IDS and AIR and potentially yield new regret analyses for MARL.

References

However, to the best of our knowledge, investigations into AIR are restricted to the simpler bandit and RL settings, while their applicability in the competitive MARL environment remains unknown.

— Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning (2404.19292 - Zhang et al., 30 Apr 2024) in Section 1: Introduction — Related works, Connections to AIR paragraph

Applicability of Algorithmic Information Ratio (AIR) in competitive multi-agent reinforcement learning

Background

References

Related Problems