Non-interactive MAIL guarantees depending only on single-policy concentrability

Determine whether there exists a non-interactive Multi-Agent Imitation Learning algorithm for finite-horizon two-player zero-sum Markov games that achieves an ε-approximate Nash equilibrium with sample complexity guarantees depending solely on the single-policy deviation concentrability coefficient C(μ,ν) (defined via occupancy ratios against best responses under the dataset distribution), and not on the all-policy deviation concentrability coefficient C_max.

Background

The paper studies fundamental limits of Multi-Agent Imitation Learning (MAIL) and distinguishes between two concentrability measures. The quantity C(μ,ν) captures coverage needed to evaluate deviations against a fixed opponent’s policy, whereas C_max measures coverage against all policy deviations. Prior work provided a lower bound in the non-interactive setting that depends on C(μ,ν) and an upper bound for Behavior Cloning depending on C_max, leaving a gap.

This question probes whether non-interactive algorithms can avoid the more stringent C_max dependence by relying only on C(μ,ν), clarifying when non-interactive MAIL is feasible from fixed expert datasets generated by Nash equilibrium policies.

References

Open Question 1 Does there exist a non-interactive MAIL algorithm with guarantees featuring only $\mathcal{C}(\mu,\nu)$ and not $\mathcal{C}_{\max}$?

— Rate optimal learning of equilibria from data (2510.09325 - Freihaut et al., 10 Oct 2025) in Introduction (Open Question 1)

Non-interactive MAIL guarantees depending only on single-policy concentrability

Background

References

Related Problems