Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Task-Relevant Loss Functions in Meta-Reinforcement Learning and Online LQR (2312.05465v1)

Published 9 Dec 2023 in cs.LG, cs.SY, and eess.SY

Abstract: Designing a competent meta-reinforcement learning (meta-RL) algorithm in terms of data usage remains a central challenge to be tackled for its successful real-world applications. In this paper, we propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner. As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment. The key component of our method is the loss function for learning the task inference module and the system model that systematically couples the model discrepancy and the value estimate, thereby facilitating the learning of the policy and the task inference module with a significantly smaller amount of data compared to the existing meta-RL algorithms. The idea is also extended to a non-meta-RL setting, namely an online linear quadratic regulator (LQR) problem, where our method can be simplified to reveal the essence of the strategy. The proposed method is evaluated in high-dimensional robotic control and online LQR problems, empirically verifying its effectiveness in extracting information indispensable for solving the tasks from observations in a sample efficient manner.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International Conference on Machine Learning.   PMLR, 2017, pp. 1126–1135.
  2. A. Gupta, R. Mendonca, Y. Liu, P. Abbeel, and S. Levine, “Meta-reinforcement learning of structured exploration strategies,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  3. K. Rakelly, A. Zhou, C. Finn, S. Levine, and D. Quillen, “Efficient off-policy meta-reinforcement learning via probabilistic context variables,” in International Conference on Machine Learning.   PMLR, 2019, pp. 5331–5340.
  4. S. Tu and B. Recht, “The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint,” in Conference on Learning Theory.   PMLR, 2019, pp. 3036–3083.
  5. W. Sun, N. Jiang, A. Krishnamurthy, A. Agarwal, and J. Langford, “Model-based RL in contextual decision processes: PAC bounds and exponential improvements over model-free approaches,” in Conference on Learning Theory.   PMLR, 2019, pp. 2898–2933.
  6. V. Pong, S. Gu, M. Dalal, and S. Levine, “Temporal difference models: Model-free deep RL for model-based control,” in International Conference on Learning Representations, 2018.
  7. A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn, “Learning to adapt in dynamic, real-world environments through meta-reinforcement learning,” in International Conference on Learning Representations, 2018.
  8. S. Sæmundsson, K. Hofmann, and M. Deisenroth, “Meta reinforcement learning with latent variable Gaussian processes,” in 34th Conference on Uncertainty in Artificial Intelligence, vol. 34.   AUAI, 2018, pp. 642–652.
  9. A. Galashov, J. Schwarz, H. Kim, M. Garnelo, D. Saxton, P. Kohli, S. Eslami, and Y. W. Teh, “Meta-learning surrogate models for sequential decision making,” arXiv preprint arXiv:1903.11907, 2019.
  10. Q. Wang and H. Van Hoof, “Model-based meta reinforcement learning using graph structured surrogate models and amortized policy search,” in International Conference on Machine Learning.   PMLR, 2022, pp. 23 055–23 077.
  11. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International Conference on Machine Learning.   PMLR, 2015, pp. 1889–1897.
  12. J. Shin, A. Hakobyan, M. Park, Y. Kim, G. Kim, and I. Yang, “Infusing model predictive control into meta-reinforcement learning for mobile robots in dynamic environments,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 065–10 072, 2022.
  13. H. Fu, H. Tang, J. Hao, C. Chen, X. Feng, D. Li, and W. Liu, “Towards effective context for meta-reinforcement learning: an approach based on contrastive learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 8, 2021, pp. 7457–7465.
  14. E. Z. Liu, A. Raghunathan, P. Liang, and C. Finn, “Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices,” in International Conference on Machine Learning.   PMLR, 2021, pp. 6925–6935.
  15. J. X. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R. Munos, C. Blundell, D. Kumaran, and M. Botvinick, “Learning to reinforcement learn,” arXiv preprint arXiv:1611.05763, 2016.
  16. Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel, “RL22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning,” arXiv preprint arXiv:1611.02779, 2016.
  17. J. Beck, R. Vuorio, E. Z. Liu, Z. Xiong, L. Zintgraf, C. Finn, and S. Whiteson, “A survey of meta-reinforcement learning,” arXiv preprint arXiv:2301.08028, 2023.
  18. L. Zintgraf, K. Shiarlis, M. Igl, S. Schulze, Y. Gal, K. Hofmann, and S. Whiteson, “VariBAD: A very good method for bayes-adaptive deep RL via meta-learning,” in International Conference on Learning Representations, 2019.
  19. A. Zhang, R. T. McAllister, R. Calandra, Y. Gal, and S. Levine, “Learning invariant representations for reinforcement learning without reconstruction,” in International Conference on Learning Representations, 2020.
  20. B. Á. Pires and C. Szepesvári, “Policy error bounds for model-based reinforcement learning with factored linear models,” in Conference on Learning Theory.   PMLR, 2016, pp. 121–151.
  21. C. F. Perez, F. P. Such, and T. Karaletsos, “Efficient transfer learning and online adaptation with latent variable models for continuous control,” arXiv preprint arXiv:1812.03399, 2018.
  22. S. Belkhale, R. Li, G. Kahn, R. McAllister, R. Calandra, and S. Levine, “Model-based meta-reinforcement learning for flight with suspended payloads,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1471–1478, 2021.
  23. A.-m. Farahmand, “Iterative value-aware model learning,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  24. R. S. Sutton, “Dyna, an integrated architecture for learning, planning, and reacting,” ACM Sigart Bulletin, vol. 2, no. 4, pp. 160–163, 1991.
  25. R. Munos and C. Szepesvári, “Finite-time bounds for fitted value iteration.” Journal of Machine Learning Research, vol. 9, no. 5, 2008.
  26. K. Lowrey, A. Rajeswaran, S. Kakade, E. Todorov, and I. Mordatch, “Plan online, learn offline: Efficient learning and exploration via model-based control,” arXiv preprint arXiv:1811.01848, 2018.
  27. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1861–1870.
  28. M. Geist, B. Scherrer, and O. Pietquin, “A theory of regularized markov decision processes,” in International Conference on Machine Learning.   PMLR, 2019, pp. 2160–2169.
  29. H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  30. M. Simchowitz and D. Foster, “Naive exploration is optimal for online LQR,” in International Conference on Machine Learning.   PMLR, 2020, pp. 8937–8948.
  31. Y. Abbasi-Yadkori and C. Szepesvári, “Regret bounds for the adaptive control of linear quadratic systems,” in Proceedings of the 24th Annual Conference on Learning Theory.   JMLR Workshop and Conference Proceedings, 2011, pp. 1–26.
  32. M. Abeille and A. Lazaric, “Thompson sampling for linear-quadratic control problems,” in Artificial Intelligence and Statistics.   PMLR, 2017, pp. 1246–1254.
  33. S. Tunyasuvunakool, A. Muldal, Y. Doron, S. Liu, S. Bohez, J. Merel, T. Erez, T. Lillicrap, N. Heess, and Y. Tassa, “dm__\__control: Software and tasks for continuous control,” Software Impacts, vol. 6, p. 100022, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2665963820300099
  34. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2012, pp. 5026–5033.
  35. H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951.

Summary

We haven't generated a summary for this paper yet.