A Note on Loss Functions and Error Compounding in Model-based Reinforcement Learning
Abstract: This note clarifies some confusions (and perhaps throws out more) around model-based reinforcement learning and their theoretical understanding in the context of deep RL. Main topics of discussion are (1) how to reconcile model-based RL's bad empirical reputation on error compounding with its superior theoretical properties, and (2) the limitations of empirically popular losses. For the latter, concrete counterexamples for the "MuZero loss" are constructed to show that it not only fails in stochastic environments, but also suffers exponential sample complexity in deterministic environments when data provides sufficient coverage.
- Lipschitz continuity in model-based reinforcement learning. In International Conference on Machine Learning, pages 264–273. PMLR, 2018.
- Leemon Baird. Residual algorithms: Reinforcement learning with function approximation. In Machine Learning Proceedings 1995, pages 30–37. Elsevier, 1995.
- Model selection in reinforcement learning. Machine learning, 85(3):299–332, 2011.
- Value-aware loss function for model-based reinforcement learning. In Artificial Intelligence and Statistics, 2017.
- Metrics for finite Markov decision processes. In Proceedings of Uncertainty in Artificial Intelligence, pages 162–169, 2004.
- Nan Jiang. Notes on State Abstractions. University of Illinois at Urbana-Champaign, 2018. http://nanjiang.cs.illinois.edu/files/cs598/note4.pdf.
- Representation learning with multi-step inverse kinematics: An efficient and optimal approach to rich-observation rl. In International Conference on Machine Learning, pages 24659–24700. PMLR, 2023.
- Kinematic state abstraction and provably efficient rich-observation reinforcement learning. In International conference on machine learning, pages 6961–6971. PMLR, 2020.
- Rémi Munos. Performance bounds in l_p-norm for approximate value iteration. SIAM journal on control and optimization, 46(2):541–561, 2007.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
- Eligibility traces for off-policy policy evaluation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 759–766, 2000.
- On the use of non-stationary policies for stationary infinite-horizon markov decision processes. Advances in Neural Information Processing Systems, 25, 2012.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches. In Conference on Learning Theory, 2019.
- Erik Talvitie. Self-correcting models for model-based reinforcement learning. In AAAI Conference on Artificial Intelligence, 2017.
- Toward understanding state representation learning in muzero: A case study in linear quadratic gaussian control. In 2023 62nd IEEE Conference on Decision and Control (CDC), pages 6166–6171. IEEE, 2023.
- λ𝜆\lambdaitalic_λ-AC: Effective decision-aware reinforcement learning with latent models. 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.