Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tracking Control for a Spherical Pendulum via Curriculum Reinforcement Learning (2309.14096v1)

Published 25 Sep 2023 in cs.LG and cs.RO

Abstract: Reinforcement Learning (RL) allows learning non-trivial robot control laws purely from data. However, many successful applications of RL have relied on ad-hoc regularizations, such as hand-crafted curricula, to regularize the learning performance. In this paper, we pair a recent algorithm for automatically building curricula with RL on massively parallelized simulations to learn a tracking controller for a spherical pendulum on a robotic arm via RL. Through an improved optimization scheme that better respects the non-Euclidean task structure, we allow the method to reliably generate curricula of trajectories to be tracked, resulting in faster and more robust learning compared to an RL baseline that does not exploit this form of structured learning. The learned policy matches the performance of an optimal control baseline on the real system, demonstrating the potential of curriculum RL to jointly learn state estimation and control for non-linear tracking tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
  2. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
  3. S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International journal of robotics research, vol. 37, no. 4-5, pp. 421–436, 2018.
  4. I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, et al., “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019.
  5. N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning, 2022.
  6. B. Technology, “Barrett whole arm manipulator,” https://advanced.barrett.com/wam-arm-1, July 2023.
  7. A. Y. Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in International Conference on Machine Learning (ICML), 1999.
  8. A. Gupta, A. Pacchiano, Y. Zhai, S. Kakade, and S. Levine, “Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity,” Neural Information Processing Systems (NeurIPS), vol. 35, 2022.
  9. S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and P. Stone, “Curriculum learning for reinforcement learning domains: A framework and survey,” Journal of Machine Learning Research (JMLR), vol. 21, no. 181, pp. 1–50, 2020.
  10. S. Sukhbaatar, Z. Lin, I. Kostrikov, G. Synnaeve, A. Szlam, and R. Fergus, “Intrinsic motivation and automatic curricula via asymmetric self-play,” in International Conference on Learning Representations (ICLR), 2018.
  11. A. Baranes and P.-Y. Oudeyer, “Intrinsically motivated goal exploration for active motor learning in robots: A case study,” in International Conference on Intelligent Robots and Systems (IROS), 2010.
  12. P. Klink, H. Yang, C. D’Eramo, J. Peters, and J. Pajarinen, “Curriculum reinforcement learning via constrained optimal transport,” in International Conference on Machine Learning.   PMLR, 2022, pp. 11 341–11 358.
  13. S. Chen, B. Zhang, M. W. Mueller, A. Rai, and K. Sreenath, “Learning torque control for quadrupedal locomotion,” arXiv preprint arXiv:2203.05194, 2022.
  14. Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Robust and versatile bipedal jumping control through multi-task reinforcement learning,” arXiv preprint arXiv:2302.09450, 2023.
  15. S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” in international conference on robotics and automation (ICRA), 2017.
  16. P. Liu, K. Zhang, D. Tateo, S. Jauhri, Z. Hu, J. Peters, and G. Chalvatzaki, “Safe reinforcement learning of dynamic high-dimensional robotic tasks: navigation, manipulation, interaction,” in International Conference on Robotics and Automation (ICRA), 2023.
  17. K. H. Lundberg and T. W. Barton, “History of inverted-pendulum systems,” IFAC Proceedings Volumes, vol. 42, no. 24, pp. 131–135, 2010.
  18. O. G. Selfridge, R. S. Sutton, and A. G. Barto, “Training and tracking in robotics.” in Ijcai, 1985.
  19. J. Kober and J. Peters, “Policy search for motor primitives in robotics,” Advances in neural information processing systems, 2008.
  20. M. Lutter, B. Belousov, S. Mannor, D. Fox, A. Garg, and J. Peters, “Continuous-time fitted value iteration for robust policies,” Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), vol. 45, no. 5, pp. 5534–5548, 2022.
  21. A. Marco, P. Hennig, J. Bohg, S. Schaal, and S. Trimpe, “Automatic lqr tuning based on gaussian process global optimization,” in International conference on robotics and automation (ICRA), 2016.
  22. A. Doerr, D. Nguyen-Tuong, A. Marco, S. Schaal, and S. Trimpe, “Model-based policy search for automatic tuning of multivariate pid controllers,” in International Conference on Robotics and Automation (ICRA), 2017.
  23. S.-T. Kao, W.-J. Chiou, and M.-T. Ho, “Balancing of a spherical inverted pendulum with an omni-directional mobile robot,” in International Conference on Control Applications (CCA), 2013.
  24. S.-T. Kao and M.-T. Ho, “Tracking control of a spherical inverted pendulum with an omnidirectional mobile robot,” in International Conference on Advanced Robotics and Intelligent Systems (ARIS), 2017.
  25. R. Yang, Y.-Y. Kuen, and Z. Li, “Stabilization of a 2-dof spherical pendulum on xy table,” in International Conference on Control Applications (CCA), 2000.
  26. B. Sprenger, L. Kucera, and S. Mourad, “Balancing of an inverted pendulum with a scara robot,” IEEE/ASME Transactions on Mechatronics, vol. 3, no. 2, pp. 91–97, 1998.
  27. M. N. Vu, C. Hartl-Nesic, and A. Kugi, “Fast swing-up trajectory optimization for a spherical pendulum on a 7-dof collaborative robot,” in International Conference on Robotics and Automation (ICRA), 2021.
  28. A. Hallak, D. Di Castro, and S. Mannor, “Contextual markov decision processes,” arXiv preprint arXiv:1502.02259, 2015.
  29. A. M. Metelli, M. Mutti, and M. Restelli, “Configurable Markov decision processes,” in International Conference on Machine Learning (ICML), 2018.
  30. M. Dennis, N. Jaques, E. Vinitsky, A. Bayen, S. Russell, A. Critch, and S. Levine, “Emergent complexity and zero-shot transfer via unsupervised environment design,” in Neural Information Processing Systems (NeurIPS), 2020.
  31. R. Portelas, C. Colas, K. Hofmann, and P.-Y. Oudeyer, “Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments,” in Conference on Robot Learning (CoRL), 2019.
  32. J. Chen, Y. Zhang, Y. Xu, H. Ma, H. Yang, J. Song, Y. Wang, and Y. Wu, “Variational automatic curriculum learning for sparse-reward cooperative multi-agent problems,” Neural Information Processing Systems (NeurIPS), 2021.
  33. NVIDIA, “Nvidia isaac sim,” https://developer.nvidia.com/isaac-sim, July 2023.
  34. N. Point, “Optitrack,” https://optitrack.com, July 2023.
  35. D. Makoviichuk and V. Makoviychuk, “rl-games: A high-performance framework for reinforcement learning,” https://github.com/Denys88/rl˙games, May 2021.
  36. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in International Conference on Intelligent Robots and Systems (IROS), 2012.
  37. E. A. Nadaraya, “On estimating regression,” Theory of Probability & Its Applications, vol. 9, no. 1, pp. 141–142, 1964.
  38. P. C. Mahalanobis, “On the generalized distance in statistics,” in Proceedings of the National Institute of Science of India, 1936.
  39. A. Asudeh, H. V. Jagadish, G. Miklau, and J. Stoyanovich, “On obtaining stable rankings,” Proceedings of the VLDB Endowment (PVLDB), vol. 12, no. 3, p. 237–250, 2018.
  40. C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, and T. E. Oliphant, “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sept. 2020.
  41. P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020.
  42. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” in Neural information processing systems (NeurIPS), 2019.
  43. D. P. Bertsekas, “Auction algorithms.” Encyclopedia of optimization, vol. 1, pp. 73–77, 2009.
  44. B. Charlier, J. Feydy, J. A. Glaunès, F.-D. Collin, and G. Durif, “Kernel operations on the gpu, with autodiff, without memory overflows,” Journal of Machine Learning Research, vol. 22, no. 74, pp. 1–6, 2021. [Online]. Available: http://jmlr.org/papers/v22/20-275.html
  45. M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler, and A. Ng, “Ros: an open-source robot operating system,” in ICRA Workshop on Open Source Robotics, 2009.
  46. C. Mastalli, R. Budhiraja, W. Merkt, G. Saurel, B. Hammoud, M. Naveau, J. Carpentier, L. Righetti, S. Vijayakumar, and N. Mansard, “Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control,” in International Conference on Robotics and Automation (ICRA), 2020.
  47. H. K. Khalil and L. Praly, “High-gain observers in nonlinear feedback control,” International Journal of Robust and Nonlinear Control, vol. 24, no. 6, pp. 993–1015, 2014.

Summary

We haven't generated a summary for this paper yet.