Gradient-Based Trajectory Optimization With Learned Dynamics (2204.04558v3)
Abstract: Trajectory optimization methods have achieved an exceptional level of performance on real-world robots in recent years. These methods heavily rely on accurate analytical models of the dynamics, yet some aspects of the physical world can only be captured to a limited extent. An alternative approach is to leverage machine learning techniques to learn a differentiable dynamics model of the system from data. In this work, we use trajectory optimization and model learning for performing highly dynamic and complex tasks with robotic systems in absence of accurate analytical models of the dynamics. We show that a neural network can model highly nonlinear behaviors accurately for large time horizons, from data collected in only 25 minutes of interactions on two distinct robots: (i) the Boston Dynamics Spot and an (ii) RC car. Furthermore, we use the gradients of the neural network to perform gradient-based trajectory optimization. In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car, and gives good performance in combination with trajectory optimization methods.
- F. Rubio, F. Valero, and C. Llopis-Albert, “A review of mobile robots: Concepts, methods, theoretical framework, and applications,” International Journal of Advanced Robotic Systems, vol. 16, no. 2, 2019.
- M. Blösch, S. Weiss, D. Scaramuzza, and R. Siegwart, “Vision based mav navigation in unknown and unstructured environments,” in 2010 IEEE International Conference on Robotics and Automation, 2010, pp. 21–28.
- C. Gehring, S. Coros, M. Hutter, C. Dario Bellicoso, H. Heijnen, R. Diethelm, M. Bloesch, P. Fankhauser, J. Hwangbo, M. Hoepflinger, and R. Siegwart, “Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped robot,” IEEE Robotics Automation Magazine, vol. 23, no. 1, pp. 34–43, 2016.
- J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science Robotics, no. 47, 2020.
- M. Geilinger, R. Poranne, R. Desai, B. Thomaszewski, and S. Coros, “Skaterbots: Optimization-based design and motion synthesis for robotic creatures with legs and wheels,” in Proceedings of ACM SIGGRAPH, A. T. on Graphics (TOG), Ed., vol. 37. ACM, August 2018.
- J. M. Bern, P. Banzet, R. Poranne, and S. Coros, “Trajectory optimization for cable-driven soft robot locomotion,” Robotics: Science and Systems XV, 2019.
- S. Zimmermann, R. Poranne, J. M. Bern, and S. Coros, “PuppetMaster,” ACM Transactions on Graphics, vol. 38, no. 4, pp. 1–11, 2019.
- K. Åström and P. Eykhoff, “System identification - a survey,” Automatica, vol. 7, no. 2, pp. 123–162, 1971.
- D. Nguyen-Tuong and J. Peters, “Model learning for robot control: a survey,” Cognitive Processing, vol. 12, no. 4, pp. 319–340, 2011.
- K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
- J. Z. Kolter, C. Plagemann, D. T. Jackson, A. Y. Ng, and S. Thrun, “A probabilistic approach to mixed open-loop and closed-loop control, with application to extreme autonomous driving,” in 2010 IEEE International Conference on Robotics and Automation, 2010, pp. 839–845.
- A. Liniger, A. Domahidi, and M. Morari, “Optimization-based autonomous racing of 1:43 scale rc cars,” Optimal Control Applications and Methods, vol. 36, no. 5, p. 628–647, Jul 2014.
- T. M. Moerland, J. Broekens, and C. M. Jonker, “Model-based reinforcement learning: A survey,” 2021.
- M. P. Deisenroth and C. E. Rasmussen, “Pilco: A model-based and data-efficient approach to policy search,” in Proceedings of the 28th International Conference on machine learning (ICML-11), ser. ICML’11. Omnipress, 2011.
- M. P. Deisenroth, D. Fox, and C. E. Rasmussen, “Gaussian processes for data-efficient learning in robotics and control,” IEEE transactions on pattern analysis and machine intelligence, pp. 408–423, 2015.
- S. Kamthe and M. P. Deisenroth, “Data-efficient reinforcement learning with probabilistic model predictive control,” in Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018.
- J. Kabzan, L. Hewing, A. Liniger, and M. N. Zeilinger, “Learning-based model predictive control for autonomous racing,” IEEE Robotics and Automation Letters, vol. 4, pp. 3363–3370, 2019.
- S. Zimmermann, R. Poranne, and S. Coros, “Go fetch! - dynamic grasps using boston dynamics spot with external robotic arm,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 4488–4494.
- A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 7559–7566.
- K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31. Curran Associates, Inc., 2018.
- A. Nagabandi, K. Konolige, S. Levine, and V. Kumar, “Deep dynamics models for learning dexterous manipulation,” in Proceedings of the Conference on Robot Learning, ser. Proceedings of Machine Learning Research, L. P. Kaelbling, D. Kragic, and K. Sugiura, Eds., vol. 100. PMLR, 30 Oct–01 Nov 2020, pp. 1101–1112.
- S. Curi, F. Berkenkamp, and A. Krause, “Efficient model-based reinforcement learning through optimistic policy search and planning,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., 2020, pp. 14 156–14 170.
- A. Chiuso and G. Pillonetto, “System identification: A machine learning perspective,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, no. 1, pp. 281–304, 2019.
- J. Sjöberg, H. Hjalmarsson, and L. Ljung, “Neural networks in system identification,” IFAC Proceedings Volumes, vol. 27, no. 8, pp. 359–382, 1994.
- K. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks,” IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 4–27, 1990.
- J.-S. Wang and Y.-P. Chen, “A fully automated recurrent neural network for unknown dynamic system identification and control,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 53, no. 6, pp. 1363–1372, 2006.
- O. Ogunmolu, X. Gu, S. Jiang, and N. Gans, “Nonlinear systems identification using deep dynamic neural networks,” arXiv preprint arXiv:1610.01439, 2016.
- J. Gonzalez and W. Yu, “Non-linear system modeling using lstm neural networks,” IFAC Conference on Modelling, Identification and Control of Nonlinear Systems MICNON, vol. 51, no. 13, pp. 485–489, 2018.
- Z. I. Botev, D. P. Kroese, R. Y. Rubinstein, and P. L’Ecuyer, “Chapter 3 - the cross-entropy method for optimization,” in Handbook of Statistics. Elsevier, 2013, pp. 35–59.
- S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds., vol. 27. Curran Associates, Inc., 2014.
- R. Boney, N. Di Palo, M. Berglund, A. Ilin, J. Kannala, A. Rasmus, and H. Valpola, “Regularizing trajectory optimization with denoising autoencoders,” in Advances in Neural Information Processing Systems, 2019.
- E. Todorov and W. Li, “A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems,” in Proceedings of the 2005, American Control Conference, 2005., 2005, pp. 300–306 vol. 1.
- D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations, 2014.
- H. Bharadhwaj, K. Xie, and F. Shkurti, “Model-predictive control via cross-entropy and gradient-based optimization,” in Proceedings of the 2nd Conference on Learning for Dynamics and Control, ser. Proceedings of Machine Learning Research, vol. 120. PMLR, 2020, pp. 277–286.
- A. Graves, “Generating sequences with recurrent neural networks,” arXiv preprint arXiv:1308.0850, 2013.
- D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” Advances in neural information processing systems, vol. 31, 2018.
- D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning latent dynamics for planning from pixels,” in International conference on machine learning. PMLR, 2019, pp. 2555–2565.
- D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by latent imagination,” 2020.
- J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” in NeurIPS 2014 Workshop on Deep Learning, December 2014, 2014.
- D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
- C. E. GarcÃa, D. M. Prett, and M. Morari, “Model predictive control: Theory and practice - a survey,” Automatica, pp. 335–348, 1989.
- S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep q-learning with model-based acceleration,” in International conference on machine learning. PMLR, 2016, pp. 2829–2838.
- M. Janner, J. Fu, M. Zhang, and S. Levine, “When to trust your model: Model-based policy optimization,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine, “Epopt: Learning robust neural network policies using model ensembles,” arXiv preprint arXiv:1610.01283, 2016.
- T. Kurutach, I. Clavera, Y. Duan, A. Tamar, and P. Abbeel, “Model-ensemble trust-region policy optimization,” arXiv preprint arXiv:1802.10592, 2018.
- I. Clavera, J. Rothfuss, J. Schulman, Y. Fujita, T. Asfour, and P. Abbeel, “Model-based reinforcement learning via meta-policy optimization,” in Proceedings of The 2nd Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 87, 29–31 Oct 2018, pp. 617–629.
- A. R. Geist and S. Trimpe, “Structured learning of rigid-body dynamics: A survey and unified view from a robotics perspective,” GAMM-Mitteilungen, vol. 44, no. 2, 2021.