Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning (2407.02217v1)

Published 2 Jul 2024 in cs.LG and cs.AI

Abstract: Applying reinforcement learning (RL) to real-world applications requires addressing a trade-off between asymptotic performance, sample efficiency, and inference time. In this work, we demonstrate how to address this triple challenge by leveraging partial physical knowledge about the system dynamics. Our approach involves learning a physics-informed model to boost sample efficiency and generating imaginary trajectories from this model to learn a model-free policy and Q-function. Furthermore, we propose a hybrid planning strategy, combining the learned policy and Q-function with the learned model to enhance time efficiency in planning. Through practical demonstrations, we illustrate that our method improves the compromise between sample efficiency, time efficiency, and performance over state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 2021.
  2. Augmenting physical simulators with stochastic neural networks: Case study of planar pushing and bouncing. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.  3066–3073. IEEE, 2018.
  3. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics, pp.  834–846, 1983.
  4. Blending mpc & value function approximation for efficient reinforcement learning. arXiv preprint arXiv:2012.05909, 2020.
  5. Sample-efficient reinforcement learning with stochastic ensemble value expansion. Advances in neural information processing systems, 31, 2018.
  6. Evaluating model-based planning and planner amortization for continuous control. arXiv preprint arXiv:2110.03363, 2021.
  7. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems, 31, 2018.
  8. Model-augmented actor-critic: Backpropagating through paths. arXiv preprint arXiv:2005.08068, 2020.
  9. Lagrangian neural networks. arXiv preprint arXiv:2003.04630, 2020.
  10. A tutorial on the cross-entropy method. Annals of Operations Research, 134:19–67, 2005.
  11. Pilco: A model-based and data-efficient approach to policy search. In ICML, 2011.
  12. Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901, 2019.
  13. Residual model-based reinforcement learning for physical dynamics. In 3rd Offline RL Workshop: Offline RL as a”Launchpad”, 2022.
  14. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp.  1587–1596. PMLR, 2018.
  15. World models. arXiv preprint arXiv:1803.10122, 2018.
  16. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  17. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
  18. Learning latent dynamics for planning from pixels. In International conference on machine learning, pp.  2555–2565. PMLR, 2019b.
  19. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  20. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  21. Temporal difference learning for model predictive control. In ICML, 2022.
  22. Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems, 28, 2015.
  23. Modelling generalized forces with reinforcement learning for sim-to-real transfer. arXiv preprint arXiv:1910.09471, 2019.
  24. Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA), pp.  6023–6029. IEEE, 2019.
  25. Combining learned and analytical models for predicting action effects. arXiv preprint arXiv:1710.04102, 11, 2017.
  26. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2016.
  27. Human-level control through deep reinforcement learning. Nature, 518:529–533, 2015.
  28. Model predictive actor-critic: Accelerating robot skill acquisition with deep reinforcement learning. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp.  6672–6678. IEEE, 2021.
  29. Physics-informed model-based reinforcement learning. In Learning for Dynamics and Control Conference, pp.  26–37. PMLR, 2023.
  30. Learning off-policy with online planning. In Conference on Robot Learning, pp.  1622–1633. PMLR, 2022.
  31. Richard S Sutton. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 2(4):160–163, 1991.
  32. Gymnasium, March 2023. URL https://zenodo.org/record/8127025.
  33. Exploring model-based planning with policy networks. arXiv preprint arXiv:1906.08649, 2019.
  34. Model-based reinforcement learning with parametrized physical models and optimism-driven exploration. In 2016 IEEE international conference on robotics and automation (ICRA), pp.  504–511. IEEE, 2016.
  35. Continuous-time model-based reinforcement learning. In International Conference on Machine Learning, pp.  12009–12018. PMLR, 2021.
  36. Augmenting physical models with deep networks for complex dynamics forecasting. Journal of Statistical Mechanics: Theory and Experiment, 2021(12):124012, 2021.
  37. The benefits of model-based generalization in reinforcement learning. arXiv preprint arXiv:2211.02222, 2022.
  38. Tossingbot: Learning to throw arbitrary objects with residual physics. IEEE Transactions on Robotics, 36(4):1307–1319, 2020.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets