Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors (2210.04819v2)

Published 10 Oct 2022 in cs.NE, cs.AI, cs.LG, and cs.RO

Abstract: Data-driven learning based methods have recently been particularly successful at learning robust locomotion controllers for a variety of unstructured terrains. Prior work has shown that incorporating good locomotion priors in the form of trajectory generators (TGs) is effective at efficiently learning complex locomotion skills. However, defining a good, single TG as tasks/environments become increasingly more complex remains a challenging problem as it requires extensive tuning and risks reducing the effectiveness of the prior. In this paper, we present Evolved Environmental Trajectory Generators (EETG), a method that learns a diverse set of specialised locomotion priors using Quality-Diversity algorithms while maintaining a single policy within the Policies Modulating TG (PMTG) architecture. The results demonstrate that EETG enables a quadruped robot to successfully traverse a wide range of environments, such as slopes, stairs, rough terrain, and balance beams. Our experiments show that learning a diverse set of specialized TG priors is significantly (5 times) more efficient than using a single, fixed prior when dealing with a wide range of environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. G. Bledt, M. J. Powell, B. Katz, J. Di Carlo, P. M. Wensing, and S. Kim, “Mit cheetah 3: Design and control of a robust, dynamic quadruped robot,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 2245–2252.
  2. M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V. Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch et al., “Anymal-a highly mobile and dynamic quadrupedal robot,” in 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS).   IEEE, 2016, pp. 38–44.
  3. B. Katz, J. Di Carlo, and S. Kim, “Mini cheetah: A platform for pushing the limits of dynamic quadruped control,” in 2019 international conference on robotics and automation (ICRA).   IEEE, 2019, pp. 6295–6301.
  4. C. Gehring, P. Fankhauser, L. Isler, R. Diethelm, S. Bachmann, M. Potz, L. Gerstenberg, and M. Hutter, “Anymal in the field: Solving industrial inspection of an offshore hvdc platform with a quadrupedal robot,” in Field and Service Robotics.   Springer, 2021, pp. 247–260.
  5. J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, p. eaau5872, 2019.
  6. J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science robotics, vol. 5, no. 47, p. eabc5986, 2020.
  7. J. Siekmann, Y. Godse, A. Fern, and J. Hurst, “Sim-to-real learning of all common bipedal gaits via periodic reward composition,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 7309–7315.
  8. N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning.   PMLR, 2022, pp. 91–100.
  9. T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Science Robotics, vol. 7, no. 62, p. eabk2822.
  10. G. B. Margolis, G. Yang, K. Paigwar, T. Chen, and P. Agrawal, “Rapid locomotion via reinforcement learning,” arXiv preprint arXiv:2205.02824, 2022.
  11. J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim, “Dynamic locomotion in the mit cheetah 3 through convex model-predictive control,” in 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS).   IEEE, 2018, pp. 1–9.
  12. A. W. Winkler, C. D. Bellicoso, M. Hutter, and J. Buchli, “Gait and trajectory optimization for legged systems through phase-based end-effector parameterization,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1560–1567, 2018.
  13. D. Kim, J. Di Carlo, B. Katz, G. Bledt, and S. Kim, “Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control,” arXiv preprint arXiv:1909.06586, 2019.
  14. G. Bledt and S. Kim, “Extracting legged locomotion heuristics with regularized predictive control,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 406–412.
  15. I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas et al., “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019.
  16. S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International journal of robotics research, vol. 37, no. 4-5, pp. 421–436, 2018.
  17. L. Smith, J. C. Kew, X. B. Peng, S. Ha, J. Tan, and S. Levine, “Legged robots that keep on learning: Fine-tuning locomotion policies in the real world,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 1593–1599.
  18. V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021.
  19. A. Iscen, K. Caluwaerts, J. Tan, T. Zhang, E. Coumans, V. Sindhwani, and V. Vanhoucke, “Policies modulating trajectory generators,” in Conference on Robot Learning.   PMLR, 2018, pp. 916–926.
  20. M. Kalakrishnan, J. Buchli, P. Pastor, M. Mistry, and S. Schaal, “Fast, robust quadruped locomotion over challenging terrain,” in 2010 IEEE International Conference on Robotics and Automation.   IEEE, 2010, pp. 2665–2670.
  21. A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,” arXiv preprint arXiv:2107.04034, 2021.
  22. D. Paglieri, “Open-ended curriculum learning for dynamic robot locomotion,” Imperial College London, Master’s Thesis, 2021.
  23. J. K. Pugh, L. B. Soros, and K. O. Stanley, “Quality diversity: A new frontier for evolutionary computation,” Frontiers in Robotics and AI, vol. 3, p. 40, 2016.
  24. A. Cully and Y. Demiris, “Quality and diversity optimization: A unifying modular framework,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 2, pp. 245–259, 2017.
  25. K. Chatzilygeroudis, A. Cully, V. Vassiliades, and J.-B. Mouret, “Quality-diversity optimization: a novel branch of stochastic optimization,” in Black Box Optimization, Machine Learning, and No-Free Lunch Theorems.   Springer, 2021, pp. 109–135.
  26. A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that can adapt like animals,” Nature, vol. 521, no. 7553, pp. 503–507, 2015.
  27. A. Salehi, A. Coninx, and S. Doncieux, “Few-shot quality-diversity optimization,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4424–4431, 2022.
  28. B. Lim, A. Reichenbach, and A. Cully, “Learning to walk autonomously via reset-free quality-diversity,” arXiv preprint arXiv:2204.03655, 2022.
  29. A. Ecoffet, J. Huizinga, J. Lehman, K. O. Stanley, and J. Clune, “First return, then explore,” Nature, vol. 590, no. 7847, pp. 580–586, 2021.
  30. R. Wang, J. Lehman, J. Clune, and K. O. Stanley, “Poet: open-ended coevolution of environments and their optimized solutions,” in Proceedings of the Genetic and Evolutionary Computation Conference, 2019, pp. 142–151.
  31. M. C. Fontaine, R. Liu, A. Khalifa, J. Modi, J. Togelius, A. K. Hoover, and S. Nikolaidis, “Illuminating mario scenes in the latent space of a generative adversarial network,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 7, 2021, pp. 5922–5930.
  32. S. Earle, J. Snider, M. C. Fontaine, S. Nikolaidis, and J. Togelius, “Illuminating diverse neural cellular automata for level generation,” in Proceedings of the Genetic and Evolutionary Computation Conference, 2022, pp. 68–76.
  33. A. Gaier, A. Asteroth, and J.-B. Mouret, “Data-efficient design exploration through surrogate-assisted illumination,” Evolutionary computation, vol. 26, no. 3, pp. 381–410, 2018.
  34. A. Cully and J.-B. Mouret, “Behavioral repertoire learning in robotics,” in Proceedings of the 15th annual conference on Genetic and evolutionary computation, 2013, pp. 175–182.
  35. K. Chatzilygeroudis, V. Vassiliades, and J.-B. Mouret, “Reset-free trial-and-error learning for robot damage recovery,” Robotics and Autonomous Systems, vol. 100, pp. 236–250, 2018.
  36. R. Kaushik, P. Desreumaux, and J.-B. Mouret, “Adaptive prior selection for repertoire-based online adaptation in robotics,” Frontiers in Robotics and AI, p. 151, 2020.
  37. B. Lim, L. Grillotti, L. Bernasconi, and A. Cully, “Dynamics-aware quality-diversity for efficient learning of skill repertoires,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 5360–5366.
  38. J.-B. Mouret and G. Maguire, “Quality diversity for multi-task optimization,” in Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 2020, pp. 121–129.
  39. R. Wang, J. Lehman, A. Rawal, J. Zhi, Y. Li, J. Clune, and K. Stanley, “Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions,” in International Conference on Machine Learning.   PMLR, 2020, pp. 9940–9951.
  40. H. Mania, A. Guy, and B. Recht, “Simple random search provides a competitive approach to reinforcement learning,” arXiv preprint arXiv:1803.07055, 2018.
  41. J.-B. Mouret and J. Clune, “Illuminating search spaces by mapping elites,” arXiv preprint arXiv:1504.04909, 2015.
  42. A. Nguyen, J. Yosinski, and J. Clune, “Understanding innovation engines: Automated creativity and improved stochastic optimization via deep learning,” Evolutionary computation, vol. 24, no. 3, pp. 545–572, 2016.
  43. J. Clune, “Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence,” arXiv preprint arXiv:1905.10985, 2019.
  44. V. Vassiliades and J.-B. Mouret, “Discovering the elite hypervolume by leveraging interspecies correlation,” Proceedings of the Genetic and Evolutionary Computation Conference, 2018.
  45. E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” 2016.
  46. Pybullet, “Trajecotory generator wrapper,” 2021. [Online]. Available: https://github.com/bulletphysics/bullet3/blob/39b8de74df93721add193e5b3d9ebee579faebf8/examples/pybullet/gym/pybullet_envs/minitaur/envs_v2/env_wrappers/pmtg_wrapper_env.py
  47. D. Paglieri, “Open-ended curriculum learning for dynamic robot locomotion,” 2021. [Online]. Available: https://github.com/DavidePaglieri/Open_Ended_CL
  48. N. Hansen, S. D. Müller, and P. Koumoutsakos, “Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es),” Evolutionary computation, vol. 11, no. 1, pp. 1–18, 2003.
  49. J. Lehman and K. O. Stanley, “Efficiently evolving programs through the search for novelty,” in Proceedings of the 12th annual conference on Genetic and evolutionary computation, 2010, pp. 837–844.
  50. B. G. Woolley and K. O. Stanley, “On the deleterious effects of a priori objectives on evolution and representation,” in Proceedings of the 13th annual conference on Genetic and evolutionary computation, 2011, pp. 957–964.
Citations (3)

Summary

We haven't generated a summary for this paper yet.