Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Curriculum Reinforcement Learning via Morphology-Environment Co-Evolution (2309.12529v1)

Published 21 Sep 2023 in cs.AI

Abstract: Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In this paper, we optimize an RL agent and its morphology through ``morphology-environment co-evolution (MECE)'', in which the morphology keeps being updated to adapt to the changing environment, while the environment is modified progressively to bring new challenges and stimulate the improvement of the morphology. This leads to a curriculum to train generalizable RL, whose morphology and policy are optimized for different environments. Instead of hand-crafting the curriculum, we train two policies to automatically change the morphology and the environment. To this end, (1) we develop two novel and effective rewards for the two policies, which are solely based on the learning dynamics of the RL agent; (2) we design a scheduler to automatically determine when to change the environment and the morphology. In experiments on two classes of tasks, the morphology and RL policies trained via MECE exhibit significantly better generalization performance in unseen test environments than SOTA morphology optimization methods. Our ablation studies on the two MECE policies further show that the co-evolution between the morphology and environment is the key to the success.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Asymptotically optimal design of piecewise cylindrical robots using motion planning. In Robotics: Science and Systems, 2017.
  2. Hardware as policy: Mechanical and computational co-optimization using deep reinforcement learning. In CoRL, 2020.
  3. Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding. In GECCO ’13, 2013.
  4. Scalable co-optimization of morphology and control in embodied machines. Journal of The Royal Society Interface, 15, 2018.
  5. Computational abstractions for interactive design of robotic devices. 2017 IEEE International Conference on Robotics and Automation (ICRA), pp.  1196–1203, 2017.
  6. Policy transfer via kinematic domain randomization and adaptation. 2021 IEEE International Conference on Robotics and Automation (ICRA), pp.  45–51, 2021.
  7. Fast graph representation learning with pytorch geometric. ArXiv, abs/1903.02428, 2019.
  8. Model-agnostic meta-learning for fast adaptation of deep networks. ArXiv, abs/1703.03400, 2017.
  9. Embodied intelligence via learning and evolution. Nature Communications, 12, 2021.
  10. Metamorph: Learning universal controllers with transformers. ArXiv, abs/2203.11931, 2022.
  11. Adversarial environment generation for learning to navigate the web. ArXiv, abs/2103.01991, 2021.
  12. Joint optimization of robot design and motion parameters using the implicit function theorem. Robotics: Science and Systems XIII, 2017.
  13. Task-agnostic morphology evolution. ArXiv, abs/2102.13100, 2021.
  14. Population based training of neural networks. ArXiv, abs/1711.09846, 2017.
  15. Simgan: Hybrid simulator identification for domain adaptation via adversarial reinforcement learning. 2021 IEEE International Conference on Robotics and Automation (ICRA), pp.  2884–2890, 2021.
  16. A survey of generalisation in deep reinforcement learning. ArXiv, abs/2111.09794, 2021.
  17. My body is a cage: the role of morphology in graph-based incompatible control. ArXiv, abs/2010.01856, 2021.
  18. Data-efficient learning of morphology and controller for a microrobot. 2019 International Conference on Robotics and Automation (ICRA), pp.  2488–2494, 2019.
  19. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2016.
  20. Data-efficient co-adaptation of morphology and behaviour with deep reinforcement learning. In CoRL, 2019.
  21. Human-level control through deep reinforcement learning. Nature, 518:529–533, 2015.
  22. Weisfeiler and leman go neural: Higher-order graph neural networks. ArXiv, abs/1810.02244, 2019.
  23. Curriculum learning for reinforcement learning domains: A framework and survey. J. Mach. Learn. Res., 21:181:1–181:50, 2020.
  24. Evolving curricula with regret-based environment design. In International Conference on Machine Learning, 2022.
  25. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
  26. Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In CoRL, 2019.
  27. Evolution strategies as a scalable alternative to reinforcement learning. ArXiv, abs/1703.03864, 2017.
  28. High-dimensional continuous control using generalized advantage estimation. CoRR, abs/1506.02438, 2016.
  29. Proximal policy optimization algorithms. ArXiv, abs/1707.06347, 2017.
  30. Sims, K. Evolving 3d morphology and behavior by competition. Artificial Life, 1:353–372, 1994.
  31. Anymorph: Learning transferable polices by inferring agent morphology. ArXiv, abs/2206.12279, 2022.
  32. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. ArXiv, abs/1901.01753, 2019a.
  33. Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In International Conference on Machine Learning, 2020.
  34. Nervenet: Learning structured policy with graph neural networks. In ICLR, 2018.
  35. Neural graph evolution: Towards efficient automatic robot design. ArXiv, abs/1906.05370, 2019b.
  36. Policy transfer with strategy optimization. ArXiv, abs/1810.05751, 2019.
  37. Transform2act: Learning a transform-and-control policy for efficient agent design. ArXiv, abs/2110.03659, 2022.
  38. Robogrammar: graph grammar for terrain-optimized robot design. ACM Trans. Graph., 39:188:1–188:16, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.