Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Diverse Skills for Local Navigation under Multi-constraint Optimality (2310.02440v1)

Published 3 Oct 2023 in cs.RO and cs.AI

Abstract: Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off and show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards. In line with previous work, further control of the diversity level can be achieved through an attract-repel reward term motivated by the Van der Waals force. We demonstrate the effectiveness of our method on a local navigation task where a quadruped robot needs to reach the target within a finite horizon. Finally, our trained policies transfer well to the real 12-DoF quadruped robot, Solo12, and exhibit diverse agile behaviors with successful obstacle traversal.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science robotics, vol. 5, no. 47, p. eabc5986, 2020.
  2. A. Kumar, Z. Fu, D. Pathak, and J. Malik, “RMA: Rapid motor adaptation for legged robots,” in Robotics: Science and Systems XVII (RSS), 2021.
  3. T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Science Robotics, vol. 7, no. 62, p. eabk2822, 2022.
  4. T. Zahavy, Y. Schroecker, F. Behbahani, K. Baumli, S. Flennerhag, S. Hou, and S. Singh, “Discovering policies with domino: Diversity optimization maintaining near optimality,” arXiv preprint arXiv:2205.13521, 2022.
  5. B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine, “Diversity is all you need: Learning skills without a reward function,” arXiv preprint arXiv:1802.06070, 2018.
  6. S. Hansen, W. Dabney, A. Barreto, T. Van de Wiele, D. Warde-Farley, and V. Mnih, “Fast task inference with variational intrinsic successor features,” International Conference on Learning Representations, 2020.
  7. D. Strouse, K. Baumli, D. Warde-Farley, V. Mnih, and S. Hansen, “Learning more skills through optimistic exploration,” International Conference on Learning Representations, 2022.
  8. K. Gregor, D. J. Rezende, and D. Wierstra, “Variational intrinsic control,” in International Conference on Learning Representations, Workshop Track Proceedings, 2017.
  9. A. Sharma, S. Gu, S. Levine, V. Kumar, and K. Hausman, “Dynamics-aware unsupervised discovery of skills,” arXiv preprint arXiv:1907.01657, 2019.
  10. J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bohez, and V. Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” arXiv preprint arXiv:1804.10332, 2018.
  11. J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, p. eaau5872, 2019.
  12. N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning.   PMLR, 2022, pp. 91–100.
  13. N. Rudin, D. Hoeller, M. Bjelonic, and M. Hutter, “Advanced skills by learning locomotion and local navigation end-to-end,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 2497–2503.
  14. D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “Anymal parkour: Learning agile navigation for quadrupedal robots,” arXiv preprint arXiv:2306.14874, 2023.
  15. X. B. Peng, E. Coumans, T. Zhang, T.-W. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” arXiv preprint arXiv:2004.00784, 2020.
  16. C. Li, M. Vlastelica, S. Blaes, J. Frey, F. Grimminger, and G. Martius, “Learning agile skills via adversarial imitation of rough partial demonstrations,” in Conference on Robot Learning.   PMLR, 2023, pp. 342–352.
  17. Y. Fuchioka, Z. Xie, and M. Van de Panne, “Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 5092–5098.
  18. D. Kang, J. Cheng, M. Zamora, F. Zargarbashi, and S. Coros, “Rl+ model-based control: Using on-demand optimal control to learn versatile legged locomotion,” arXiv preprint arXiv:2305.17842, 2023.
  19. M. Vlastelica, P. Kolev, J. Cheng, and G. Martius, “Diverse offline imitation via fenchel duality,” arXiv preprint arXiv:2307.11373, 2023.
  20. S. Kumar, A. Kumar, S. Levine, and C. Finn, “One solution is not all you need: Few-shot extrapolation via structured maxent rl,” Advances in Neural Information Processing Systems, vol. 33, pp. 8198–8210, 2020.
  21. C. Li, S. Blaes, P. Kolev, M. Vlastelica, J. Frey, and G. Martius, “Versatile skill control via self-supervised adversarial imitation of unlabeled mixed motions,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 2944–2950.
  22. S. Park, J. Choi, J. Kim, H. Lee, and G. Kim, “Lipschitz-constrained unsupervised skill discovery,” in International Conference on Learning Representations, 2021.
  23. S. Park, K. Lee, Y. Lee, and P. Abbeel, “Controllability-aware unsupervised skill discovery,” arXiv preprint arXiv:2302.05103, 2023.
  24. C. Szepesvári, “Constrained mdps and the reward hypothesis,” Musings about machine learning and other things (blog), 2020. [Online]. Available: https://readingsml.blogspot.com/2020/03/constrained-mdps-and-reward-hypothesis.html
  25. T. Zahavy, B. O’Donoghue, G. Desjardins, and S. Singh, “Reward is enough for convex mdps,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 25 746–25 759. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/d7e4cdde82a894b8f633e6d61a01ef15-Abstract.html
  26. V. S. Borkar, “An actor-critic algorithm for constrained markov decision processes,” Systems & control letters, vol. 54, no. 3, pp. 207–213, 2005.
  27. A. Stooke, J. Achiam, and P. Abbeel, “Responsive safety in reinforcement learning by PID lagrangian methods,” in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 119.   PMLR, 2020, pp. 9133–9143. [Online]. Available: http://proceedings.mlr.press/v119/stooke20a.html
  28. P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the twenty-first international conference on Machine learning, 2004, p. 1.
  29. A. Barreto, W. Dabney, R. Munos, J. J. Hunt, T. Schaul, H. P. van Hasselt, and D. Silver, “Successor features for transfer in reinforcement learning,” Advances in neural information processing systems, vol. 30, 2017.
  30. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  31. J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015.
  32. L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning et al., “Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures,” in International conference on machine learning.   PMLR, 2018, pp. 1407–1416.
  33. P.-A. Léziart, T. Flayols, F. Grimminger, N. Mansard, and P. Souères, “Implementation of a reactive walking controller for the new open-hardware quadruped solo-12,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 5007–5013.
  34. V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021.
Citations (6)

Summary

We haven't generated a summary for this paper yet.