$λ$-models: Effective Decision-Aware Reinforcement Learning with Latent Models (2306.17366v3)
Abstract: The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study on the necessary components for decision-aware reinforcement learning models and we showcase design choices that enable well-performing algorithms. To this end, we provide a theoretical and empirical investigation into algorithmic ideas in the field. We highlight that empirical design decisions established in the MuZero line of works, most importantly the use of a latent model, are vital to achieving good performance for related algorithms. Furthermore, we show that the MuZero loss function is biased in stochastic environments and establish that this bias has practical consequences. Building on these findings, we present an overview of which decision-aware loss functions are best used in what empirical scenarios, providing actionable insights to practitioners in the field.
- Policy-aware model learning for policy gradient methods. ArXiv, abs/2003.00030, 2020.
- VIPer: Iterative value-aware model learning on the value improvement path. In Decision Awareness in Reinforcement Learning Workshop at ICML 2022, 2022.
- Selective dyna-style planning under limited model capacity. In International Conference on Machine Learning, 2020.
- On the model-based stochastic value gradient for continuous reinforcement learning. In Learning for Dynamics and Control. PMLR, 2021.
- Planning in stochastic environments with a learned model. In International Conference on Learning Representations, 2022.
- On the generation of markov decision processes. The Journal of the Operational Research Society, 46, 1995.
- Model-based reinforcement learning with value-targeted regression. In International Conference on Machine Learning, 2020.
- Stochastic Optimal Control: The Discrete-Time Case. Academic Press, 1978.
- Sample-efficient reinforcement learning with stochastic ensemble value expansion. Advances in neural information processing systems, 2018.
- Pilco: A model-based and data-efficient approach to policy search. In International Conference on Machine Learning, 2011.
- Gradient-aware model-based policy search. In AAAI Conference on Artificial Intelligence, 2020.
- Mismatched no more: Joint model-policy optimization for model-based rl. In Advances in Neural Information Processing Systems, 2022.
- Amir-massoud Farahmand. Iterative value-aware model learning. In Advances in Neural Information Processing Systems, 2018.
- Value-Aware Loss Function for Model-based Reinforcement Learning. In International Conference on Artificial Intelligence and Statistics, 2017.
- Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101, 2018.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, 2018.
- Deepmdp: Learning continuous latent space models for representation learning. In International Conference on Machine Learning, 2019.
- Simplifying model-based RL: Learning representations, latent-space models, and policies with one objective. In International Conference on Learning Representations, 2023.
- Bootstrap your own latent-a new approach to self-supervised learning. In Advances in neural information processing systems, 2020.
- The value equivalence principle for model-based reinforcement learning. In Advances in Neural Information Processing Systems, 2020.
- Proper value equivalence. In Advances in Neural Information Processing Systems, 2021.
- BYOL-explore: Exploration by bootstrapped prediction. In Advances in Neural Information Processing Systems, 2022.
- A distribution-free theory of nonparametric regression. In Springer Series in Statistics, 2002.
- Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2020.
- Mastering atari with discrete world models. In International Conference on Learning Representations, 2021.
- Temporal difference learning for model predictive control. In International Conference on Machine Learning, 2022.
- Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems, 2015.
- Action noise in off-policy deep reinforcement learning: Impact on exploration and performance. Transactions on Machine Learning Research, 2022.
- When to trust your model: Model-based policy optimization. In Advances in Neural Information Processing Systems, 2019.
- Reinforcement learning with misspecified model classes. In IEEE International Conference on Robotics and Automation, 2013.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2), 2002.
- Objective mismatch in model-based reinforcement learning. In Conference on Learning for Dynamics and Control, 2020.
- Continuous control with deep reinforcement learning. International Conference on Learning Representations, 2016.
- Decision-aware model learning for actor-critic methods: When theory does not meet practice. In "I Can’t Believe It’s Not Better!" at NeurIPS Workshops, 2020.
- Playing atari with deep reinforcement learning. In NeurIPS Deep Learning Workshop. 2013.
- Model-advantage optimization for model-based reinforcement learning. ArXiv, abs/2106.14080, 2021.
- Sample complexity of reinforcement learning using linearly combined model ensembles. In International Conference on Artificial Intelligence and Statistics, 2020.
- James R. Munkres. Topology. Pearson Modern Classic, 2nd edition, 2018.
- Control-oriented model-based reinforcement learning with implicit differentiation. ArXiv, abs/2106.03273, 2021.
- Value prediction network. Advances in neural information processing systems, 30, 2017.
- BLAST: Latent dynamics models from bootstrapping. In Deep RL Workshop NeurIPS 2021, 2021.
- Martin L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. In Wiley Series in Probability and Statistics, 1994.
- Operator splitting value iteration. In Advances in Neural Information Processing Systems, 2022.
- Jeff G Schneider. Exploiting model uncertainty estimates for safe dynamic control learning. In Advances in Neural Information Processing Systems, 1997.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839), 2020.
- Data-efficient reinforcement learning with self-predictive representations. In International Conference on Learning Representations, 2021.
- Deterministic policy gradient algorithms. In International Conference on Machine Learning, 2014.
- The predictron: end-to-end learning and planning. In International Conference on Machine Learning, 2017.
- Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning Proceedings. 1990.
- Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA, 2018.
- Erin Talvitie. Self-correcting models for model-based reinforcement learning. In AAAI Conference on Artificial Intelligence, 2017.
- Understanding self-predictive learning for reinforcement learning. arXiv preprint arXiv:2212.03319, 2022.
- dm_control: Software and tasks for continuous control. Software Impacts, 6, 2020.
- Value gradient weighted model-based reinforcement learning. International Conference on Learning Representations, 2022.
- Mastering atari games with limited data. Advances in Neural Information Processing Systems, 2021.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.