Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks (2310.05808v3)

Published 9 Oct 2023 in cs.RO

Abstract: In search of a simple baseline for Deep Reinforcement Learning in locomotion tasks, we propose a model-free open-loop strategy. By leveraging prior knowledge and the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by DRL algorithms. We conduct two additional experiments using open-loop oscillators to identify current shortcomings of these algorithms. Our results show that, compared to the baseline, DRL is more prone to performance degradation when exposed to sensor noise or failure. Furthermore, we demonstrate a successful transfer from simulation to reality using an elastic quadruped, where RL fails without randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 2021.
  2. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, pp.  2623–2631, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450362016.
  3. A review on nonlinear modes in conservative mechanical systems. Annual Reviews in Control, 50:49–71, 2020.
  4. What can algebraic topology and differential geometry teach us about intrinsic dynamics and global behavior of robots? In The International Symposium of Robotics Research, pp. 468–484. Springer, 2022.
  5. Feedback Systems: An Introduction for Scientists and Engineers. Princeton University Press, USA, 2008. ISBN 0691135762.
  6. G. Bellegarda and A. J. Ijspeert. CPG-RL: Learning central pattern generators for quadruped locomotion. IEEE Robotics and Automation Letters, 2022.
  7. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  8. Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=AY8zfZm0tDd.
  9. The neuronal correlate of locomotion in fish: “fictive swimming” induced in an in vitro preparation of the lamprey spinal cord. Experimental brain research, 41(1):11–18, 1980.
  10. Gep-pg: Decoupling exploration and exploitation in deep reinforcement learning algorithms. In International conference on machine learning, pp. 1039–1048. PMLR, 2018.
  11. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2021.
  12. Online optimization of swimming and crawling in an amphibious snake robot. IEEE Transactions on Robotics, 24(1):75–87, 2008.
  13. Fred Delcomyn. Neural basis of rhythmic behavior in animals. Science, 210(4469):492–498, 1980.
  14. Controlling soft robots: balancing feedback and feedforward elements. IEEE Robotics & Automation Magazine, 24(3):75–83, 2017.
  15. Using nonlinear normal modes for execution of efficient cyclic motions in articulated soft robots. In International Symposium on Experimental Robotics, pp. 566–575. Springer, 2020.
  16. An empirical investigation of the challenges of real-world reinforcement learning. arXiv preprint arXiv:2003.11881, 2020.
  17. Making reinforcement learning work on swimmer. arXiv preprint arXiv:2208.07587, 2022.
  18. Control System Design. Prentice Hall PTR, USA, 1st edition, 2000. ISBN 0139586539.
  19. Learning to walk via deep reinforcement learning. Robotics: Science and Systems (RSS), 15:11, 2019.
  20. Nikolaus Hansen. Benchmarking a bi-population cma-es on the bbob-2009 function testbed. In Proceedings of the 11th annual conference companion on genetic and evolutionary computation conference: late breaking papers, pp. 2389–2396, 2009.
  21. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18, 2003.
  22. Deep reinforcement learning that matters. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18. AAAI Press, 2018. ISBN 978-1-57735-800-8.
  23. Dropout q-functions for doubly efficient reinforcement learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=xCVJMsPv3RT.
  24. The 37 implementation details of proximal policy optimization. In ICLR Blog Track, 2022. URL https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/.
  25. openrlbenchmark, 2023. URL https://github.com/openrlbenchmark/openrlbenchmark.
  26. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019.
  27. Auke Jan Ijspeert. Central pattern generators for locomotion control in animals and robots: A review. Neural Networks, 21(4):642–653, 2008. ISSN 0893-6080. Robotics and Neuroscience.
  28. Controlling tensegrity robots through evolution. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, GECCO ’13, pp.  1293–1300, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450319638. doi: 10.1145/2463372.2463525. URL https://doi.org/10.1145/2463372.2463525.
  29. Policies modulating trajectory generators. In Conference on Robot Learning, pp.  916–926. PMLR, 2018.
  30. Machine learning for fast quadrupedal locomotion. In AAAI, volume 4, pp.  611–616, 2004.
  31. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In International Conference on Machine Learning, pp. 5556–5566. PMLR, 2020.
  32. Dynamic locomotion gaits of a compliantly actuated quadruped with slip-like articulated legs embodied in the mechanical design. IEEE Robotics and Automation Letters, 3(4):3908–3915, 2018.
  33. Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020.
  34. Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055, 2018.
  35. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
  36. Empirical design in reinforcement learning. arXiv preprint arXiv:2304.01315, 2023.
  37. Antonin Raffin. Rl baselines3 zoo, 2020.
  38. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021a.
  39. Smooth exploration for robotic reinforcement learning. In Conference on Robot Learning, 2021b.
  40. Learning to exploit elastic actuators for quadruped locomotion. arXiv preprint arXiv:2209.07171, 2022.
  41. Towards generalization and simplicity in continuous control. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/9ddb9dd5d8aee9a76bf217a2a3c54833-Paper.pdf.
  42. Dynamic hebbian learning in adaptive frequency oscillators. Physica D: Nonlinear Phenomena, 216(2):269–281, 2006.
  43. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.
  44. Is bang-bang control all you need? solving continuous control with bernoulli policies. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=9BvDIW6_qxZ.
  45. Demonstrating a walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. Robotics: Science and Systems (RSS) Demo, 2(3):4, 2023.
  46. Autonomous drone racing with deep reinforcement learning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.  1205–1212. IEEE, 2021.
  47. Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332, 2018.
  48. Gymnasium, 2023. URL https://zenodo.org/record/8127025.
  49. Fast and efficient locomotion via learned gait transitions. In Conference on Robot Learning, pp.  773–783. PMLR, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com