Dice Question Streamline Icon: https://streamlinehq.com

Rate-optimal interactive MAIL with O(ε^{-2}) expert queries

Develop an interactive Multi-Agent Imitation Learning algorithm for finite-horizon two-player zero-sum Markov games that outputs an ε-approximate Nash equilibrium using the optimal number of expert queries of order O(ε^{-2}).

Information Square Streamline Icon: https://streamlinehq.com

Background

Existing interactive MAIL algorithms, such as MURMAIL, require a number of expert queries scaling as O(ε{-8}), which is suboptimal compared to information-theoretic lower bounds that suggest O(ε{-2}) dependence.

This question asks for an algorithmic design that matches the optimal ε-dependence in query complexity while learning an ε-approximate Nash equilibrium, thereby closing the gap between known upper and lower bounds in the interactive setting.

References

Open Question 2 Can we design an algorithm which outputs an $\varepsilon$-approximate Nash equilibrium with the optimal order of expert queries, which is $\mathcal{O}(\varepsilon{-2})$?

Rate optimal learning of equilibria from data (2510.09325 - Freihaut et al., 10 Oct 2025) in Introduction (Open Question 2)