Optimal dependence on S, A, and H in MAIL sample complexity

Characterize the optimal dependence on the state-space size S, action-space size(s) A (or A_max), and horizon H in the sample complexity guarantees of Multi-Agent Imitation Learning algorithms that learn ε-approximate Nash equilibria in finite-horizon two-player zero-sum Markov games.

Background

The paper establishes rate-optimal ε-dependence for both non-interactive and interactive MAIL, but leaves unanswered how other fundamental problem parameters influence optimal sample complexity.

Determining tight dependence on S, A, and H would refine theoretical guarantees and inform practical algorithm design, revealing whether current bounds are optimal or improvable along these dimensions.

References

First, while we have closed the gap in $\varepsilon$-dependence, optimal guarantees with respect to other problem parameters $S$, $A$, and $H$ remain unknown.

— Rate optimal learning of equilibria from data (2510.09325 - Freihaut et al., 10 Oct 2025) in Conclusion and future directions

Optimal dependence on S, A, and H in MAIL sample complexity

Background

References

Related Problems