Non-interactive MAIL guarantees depending only on single-policy concentrability
Determine whether there exists a non-interactive Multi-Agent Imitation Learning algorithm for finite-horizon two-player zero-sum Markov games that achieves an ε-approximate Nash equilibrium with sample complexity guarantees depending solely on the single-policy deviation concentrability coefficient C(μ,ν) (defined via occupancy ratios against best responses under the dataset distribution), and not on the all-policy deviation concentrability coefficient C_max.
References
Open Question 1 Does there exist a non-interactive MAIL algorithm with guarantees featuring only $\mathcal{C}(\mu,\nu)$ and not $\mathcal{C}_{\max}$?
— Rate optimal learning of equilibria from data
(2510.09325 - Freihaut et al., 10 Oct 2025) in Introduction (Open Question 1)