Optimal dependence on S, A, and H in MAIL sample complexity
Characterize the optimal dependence on the state-space size S, action-space size(s) A (or A_max), and horizon H in the sample complexity guarantees of Multi-Agent Imitation Learning algorithms that learn ε-approximate Nash equilibria in finite-horizon two-player zero-sum Markov games.
References
First, while we have closed the gap in $\varepsilon$-dependence, optimal guarantees with respect to other problem parameters $S$, $A$, and $H$ remain unknown.
— Rate optimal learning of equilibria from data
(2510.09325 - Freihaut et al., 10 Oct 2025) in Conclusion and future directions