Dice Question Streamline Icon: https://streamlinehq.com

Infinite-horizon extensions of reward-free warm-up and MAIL-WARM

Establish whether analogous sample complexity guarantees for interactive Multi-Agent Imitation Learning can be obtained in the infinite-horizon discounted setting, by developing reward-free exploration and analysis tools that do not rely on finite-horizon-specific algorithms such as EULER.

Information Square Streamline Icon: https://streamlinehq.com

Background

The proposed MAIL-WARM framework and its analysis rely on finite-horizon regret guarantees for EULER used in the reward-free warm-up phase. Extending this approach to infinite horizons is non-trivial because the underlying exploration and regret tools differ significantly.

Resolving this would broaden the applicability of interactive MAIL to discounted infinite-horizon Markov games and may yield advances of independent interest in reward-free reinforcement learning.

References

Whether analogous results can be obtained in the infinite-horizon regime remains an open challenge, and progress in this direction could be of interest independent of MAIL.

Rate optimal learning of equilibria from data (2510.09325 - Freihaut et al., 10 Oct 2025) in Conclusion and future directions