Fast bandit last-iterate convergence without the A2L reduction (AOG-based dynamics)
Ascertain whether uncoupled learning dynamics constructed from existing algorithms such as Accelerated Optimistic Gradient (AOG), without relying on the A2L reduction, can be designed and analyzed to achieve fast last-iterate convergence rates under bandit (payoff-based) feedback in multi-player zero-sum polymatrix games.
Sponsor
References
Nevertheless, it remains an interesting open question whether one can design uncoupled learning dynamics with fast convergence rates in the bandit feedback setting using existing algorithms like AOG without relying on the A2L reduction.
— From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications
(2506.03464 - Cai et al., 4 Jun 2025) in Section 6 (Learning in Zero-Sum Games with Bandit Feedback) — Discussion