Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Dive into Model-free Reinforcement Learning for Biological and Robotic Systems: Theory and Practice (2405.11457v1)

Published 19 May 2024 in cs.RO, cs.AI, and cs.LG

Abstract: Animals and robots exist in a physical world and must coordinate their bodies to achieve behavioral objectives. With recent developments in deep reinforcement learning, it is now possible for scientists and engineers to obtain sensorimotor strategies (policies) for specific tasks using physically simulated bodies and environments. However, the utility of these methods goes beyond the constraints of a specific task; they offer an exciting framework for understanding the organization of an animal sensorimotor system in connection to its morphology and physical interaction with the environment, as well as for deriving general design rules for sensing and actuation in robotic systems. Algorithms and code implementing both learning agents and environments are increasingly available, but the basic assumptions and choices that go into the formulation of an embodied feedback control problem using deep reinforcement learning may not be immediately apparent. Here, we present a concise exposition of the mathematical and algorithmic aspects of model-free reinforcement learning, specifically through the use of \textit{actor-critic} methods, as a tool for investigating the feedback control underlying animal and robotic behavior.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (107)
  1. A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. arXiv preprint arXiv:220807860. 2022.
  2. Wing rotation and the aerodynamic basis of insect flight. Science. 1999;284(5422):1954-60.
  3. Spikes alone do not behavior make: why neuroscience needs biomechanics. Current opinion in neurobiology. 2011;21(5):816-22.
  4. Hierarchical motor control in mammals and machines. Nature Communications. 2019;10(1):1-12.
  5. Madhav MS, Cowan NJ. The synergy between neuroscience and control theory: the nervous system as inspiration for hard control challenges. Annual Review of Control, Robotics, and Autonomous Systems. 2020;3:243-67.
  6. A review on locomotion robophysics: the study of movement at the intersection of robotics, soft matter and dynamical systems. Reports on Progress in Physics. 2016;79(11):110001.
  7. Li C, Qian F. Swift progress for robots over complex terrain. Nature Publishing Group UK London; 2023.
  8. Learning agile soccer skills for a bipedal robot with deep reinforcement learning. Science Robotics. 2024;9(89):eadi8022.
  9. Learning efficient navigation in vortical flow fields. Nature communications. 2021;12(1):7143.
  10. Emergent behaviour and neural dynamics in artificial agents tracking odour plumes. Nature machine intelligence. 2023;5(1):58-70.
  11. Todorov E, Jordan MI. Optimal feedback control as a theory of motor coordination. Nature neuroscience. 2002;5(11):1226-35.
  12. Model-Free reinforcement learning with continuous action in practice. In: 2012 American Control Conference (ACC); 2012. p. 2177-82.
  13. Haith AM, Krakauer JW. Model-Based and Model-Free Mechanisms of Human Motor Learning. In: Richardson MJ, Riley MA, Shockley K, editors. Progress in Motor Control. New York, NY: Springer New York; 2013. p. 1-21.
  14. Robots that can adapt like animals. Nature. 2015;521(7553):503-7. Available from: https://doi.org/10.1038/nature14422.
  15. Sutton RS, Barto AG. Reinforcement learning: An introduction. MIT press; 2018.
  16. Silver D. Lectures on Reinforcement Learning; 2015. url: https://www.davidsilver.uk/teaching/.
  17. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:170702286. 2017.
  18. Ecological robotics. Adaptive Behavior. 1998;6(3-4):473-507.
  19. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research. 2013;32(11):1238-74.
  20. Learning quadrupedal locomotion on deformable terrain. Science Robotics. 2023;8(74):eade2256.
  21. Pfeifer R, Bongard J. How the body shapes the way we think: a new view of intelligence. MIT press; 2006.
  22. Barrett L. Beyond the brain: How body and environment shape animal and human minds. Princeton University Press; 2011.
  23. Bellman R. A Markovian decision process. Journal of mathematics and mechanics. 1957:679-84.
  24. Åström KJ. Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis and Applications. 1965;10(1):174-205.
  25. Planning and acting in partially observable stochastic domains. Artificial intelligence. 1998;101(1-2):99-134.
  26. Reinforcement learning algorithm for partially observable Markov decision problems. In: Advances in neural information processing systems; 1995. p. 345-52.
  27. Spaan MT. Partially observable Markov decision processes. In: Reinforcement Learning. Springer; 2012. p. 387-414.
  28. Glider soaring via reinforcement learning in the field. Nature. 2018;562(7726):236-9.
  29. Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences. 2018;115(23):5849-54.
  30. Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward. In: Prieditis A, Russell S, editors. Machine Learning Proceedings 1995. San Francisco (CA): Morgan Kaufmann; 1995. p. 295-303. Available from: https://www.sciencedirect.com/science/article/pii/B978155860377650044X.
  31. Reinforcement learning in POMDP’s via direct gradient ascent. In: ICML. Citeseer; 2000. p. 41-8.
  32. Deep Reinforcement Learning Using Genetic Algorithm for Parameter Optimization. In: 2019 Third IEEE International Conference on Robotic Computing (IRC); 2019. p. 596-601.
  33. Rafati J, Marica RF. In: Wani MA, Kantardzic M, Sayed-Mouchaweh M, editors. Quasi-Newton Optimization Methods for Deep Learning Applications. Singapore: Springer Singapore; 2020. p. 9-38. Available from: https://doi.org/10.1007/978-981-15-1816-4_2.
  34. Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning. 1992;8(3-4):229-56.
  35. Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems; 2000. p. 1057-63.
  36. Kakade SM. A natural policy gradient. In: Advances in neural information processing systems; 2002. p. 1531-8.
  37. Peters J, Schaal S. Natural actor-critic. Neurocomputing. 2008;71(7-9):1180-90.
  38. Peters J, Schaal S. Policy gradient methods for robotics. In: Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on. IEEE; 2006. p. 2219-25.
  39. Adaptive step-size for policy gradient methods. In: Advances in Neural Information Processing Systems; 2013. p. 1394-402.
  40. Deterministic policy gradient algorithms. In: ICML; 2014. .
  41. Addressing function approximation error in actor-critic methods. In: International conference on machine learning. PMLR; 2018. p. 1587-96.
  42. Sutton RS. Learning to predict by the methods of temporal differences. Machine learning. 1988;3(1):9-44.
  43. Deep-reinforcement-learning for gliding and perching bodies. arXiv preprint arXiv:180703671. 2018.
  44. Optimal control of point-to-point navigation in turbulent time dependent flows using reinforcement learning. In: AIxIA 2020–Advances in Artificial Intelligence: XIXth International Conference of the Italian Association for Artificial Intelligence, Virtual Event, November 25–27, 2020, Revised Selected Papers. Springer; 2021. p. 223-34.
  45. Neural networks for control. MIT press; 1995.
  46. Lewis FL. Neural network control of robot manipulators. IEEE Expert. 1996;11(3):64-75.
  47. Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:13126114. 2013.
  48. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:14014082. 2014.
  49. Gradient estimation using stochastic computation graphs. In: Advances in Neural Information Processing Systems; 2015. p. 3528-36.
  50. Asynchronous methods for deep reinforcement learning. In: International conference on machine learning; 2016. p. 1928-37.
  51. Trust region policy optimization. In: International Conference on Machine Learning; 2015. p. 1889-97.
  52. Proximal policy optimization algorithms. arXiv preprint arXiv:170706347. 2017.
  53. Owen AB. Monte Carlo theory, methods and examples; 2013.
  54. Tokdar ST, Kass RE. Importance sampling: a review. WIREs Computational Statistics. 2010;2(1):54-60. Available from: https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wics.56.
  55. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In: International Conference on Machine Learning; 2018. p. 1406-15.
  56. Kullback S. Information Theory and Statistics. A Wiley publication in mathematical statistics. Dover Publications; 1997. Available from: https://books.google.com/books?id=luHcCgAAQBAJ.
  57. Relative Entropy Policy Search. In: Conference/Book Title Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence (AAAI), Physically Grounded AI Track; 2010. .
  58. Maximum a posteriori policy optimisation. In: International Conference on Learning Representations; 2018. .
  59. Learning continuous control policies by stochastic value gradients. In: Advances in Neural Information Processing Systems; 2015. p. 2944-52.
  60. Thrun SB. Efficient Exploration In Reinforcement Learning. USA; 1992.
  61. Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971. 2015.
  62. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529.
  63. Williams RJ, Peng J. Function optimization using connectionist reinforcement learning algorithms. Connection Science. 1991;3(3):241-68.
  64. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:180101290. 2018.
  65. LaValle SM. Planning algorithms. Cambridge university press; 2006.
  66. Learning to navigate in complex environments. In: International Conference on Learning Representations; 2017. .
  67. Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv:180310760. 2018.
  68. A locally-blazed ant trail achieves efficient collective navigation despite limited information. Elife. 2016;5:e20185.
  69. Ants regulate colony spatial organization using multiple chemical road-signs. Nature communications. 2017;8(1):1-11.
  70. The physics of cooperative transport in groups of ants. Nature Physics. 2018;14(7):683-93.
  71. Hierarchical visuomotor control of humanoids. In: International Conference on Learning Representations; 2019. .
  72. Neural probabilistic motor primitives for humanoid control. In: International Conference on Learning Representations; 2019. .
  73. Emergent systematic generalization in a situated agent. In: International Conference on Learning Representations; 2020. .
  74. Gibson JJ. The ecological approach to visual perception: classic edition. Psychology Press; 2014.
  75. Reusable neural skill embeddings for vision-guided whole body movement and object manipulation. arXiv preprint arXiv:191106636. 2019.
  76. Control-limited differential dynamic programming. In: Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE; 2014. p. 1168-75.
  77. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE; 2017. p. 3357-64.
  78. Deep neuroethology of a virtual rodent. In: International Conference on Learning Representations; 2020. .
  79. Flexing computational muscle: modeling and simulation of musculotendon dynamics. Journal of biomechanical engineering. 2013;135(2).
  80. Viper: Volume invariant position-based elastic rods. Proceedings of the ACM on Computer Graphics and Interactive Techniques. 2019;2(2):1-26.
  81. Autonomous functional movements in a tendon-driven limb via limited experience. Nature machine intelligence. 2019;1(3):144-54.
  82. Scalable muscle-actuated human simulation and control. ACM Transactions on Graphics (TOG). 2019;38(4):73.
  83. Peng XB, van de Panne M. Learning locomotion skills using DeepRL: Does the choice of action space matter? In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation; 2017. p. 1-13.
  84. Genie: Generative Interactive Environments. arXiv preprint arXiv:240215391. 2024.
  85. Sitti M. Physical intelligence as a new paradigm. Extreme Mechanics Letters. 2021;46:101340.
  86. Automatic Snake Gait Generation Using Model Predictive Control. CoRR. 2019;abs/1909.11204. Available from: http://arxiv.org/abs/1909.11204.
  87. Learning to Learn Faster from Human Feedback with Language Model Predictive Control. arXiv preprint arXiv:240211450. 2024.
  88. Recurrent model-free rl can be a strong baseline for many pomdps. arXiv preprint arXiv:211005038. 2021.
  89. Recurrent experience replay in distributed reinforcement learning. In: International Conference on Learning Representations; 2019. .
  90. Recurrent deterministic policy gradient method for bipedal locomotion on rough terrain challenge. In: 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE; 2018. p. 311-8.
  91. Real-world humanoid locomotion with reinforcement learning. Science Robotics. 2024;9(89):eadi9579.
  92. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems. 2021;34:15084-97.
  93. Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers. arXiv preprint arXiv:210703996. 2021.
  94. The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models. arXiv preprint arXiv:240201874. 2024.
  95. Text2reward: Automated dense reward function generation for reinforcement learning. arXiv preprint arXiv:230911489. 2023.
  96. Gen2sim: Scaling up robot learning in simulation with generative models. arXiv preprint arXiv:231018308. 2023.
  97. Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models. arXiv preprint arXiv:240405291. 2024.
  98. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research. 2004;5(Nov):1471-530.
  99. Safe and efficient off-policy reinforcement learning. In: Advances in Neural Information Processing Systems; 2016. p. 1054-62.
  100. Soft policy gradient method for maximum entropy deep reinforcement learning. arXiv preprint arXiv:190903198. 2019.
  101. Understanding the impact of entropy on policy optimization. arXiv preprint arXiv:181111214. 2018.
  102. Openai gym. arXiv preprint arXiv:160601540. 2016.
  103. DeepMind Control Suite. arXiv preprint arXiv:180100690. 2018.
  104. Surreal: Open-source reinforcement learning framework and robot manipulation benchmark. In: Conference on Robot Learning; 2018. p. 767-82.
  105. Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters. 2020;5(2):3019-26.
  106. Jumanji: a diverse suite of scalable reinforcement learning environments in jax. arXiv preprint arXiv:230609884. 2023.
  107. Habitat 3.0: A co-habitat for humans, avatars and robots. arXiv preprint arXiv:231013724. 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yusheng Jiao (9 papers)
  2. Feng Ling (10 papers)
  3. Sina Heydari (7 papers)
  4. Nicolas Heess (139 papers)
  5. Josh Merel (31 papers)
  6. Eva Kanso (48 papers)