Comprehensive theory for UDRL, GCSL, and ODT across finite and asymptotic iterations
Establish a comprehensive, rigorous theoretical treatment of the behavior of Upside-Down Reinforcement Learning (UDRL), Goal-Conditioned Supervised Learning (GCSL), and Online Decision Transformers (ODT) for both a finite number of training iterations and in the asymptotic limit, including convergence and stability characterizations under general Markov decision process settings.
References
Although provide an in-depth analysis of the first eUDRL iteration in the context of offline RL, a comprehensive treatment of the behavior of UDRL, GCSL and ODT at a finite number of iterations and in the asymptotic limit remains an open problem.
                — On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers
                
                (2502.05672 - Štrupl et al., 8 Feb 2025) in Introduction