Transformer-based Model Predictive Control: Trajectory Optimization via Sequence Modeling (2410.23916v1)
Abstract: Model predictive control (MPC) has established itself as the primary methodology for constrained control, enabling general-purpose robot autonomy in diverse real-world scenarios. However, for most problems of interest, MPC relies on the recursive solution of highly non-convex trajectory optimization problems, leading to high computational complexity and strong dependency on initialization. In this work, we present a unified framework to combine the main strengths of optimization-based and learning-based methods for MPC. Our approach entails embedding high-capacity, transformer-based neural network models within the optimization process for trajectory generation, whereby the transformer provides a near-optimal initial guess, or target plan, to a non-convex optimization problem. Our experiments, performed in simulation and the real world onboard a free flyer platform, demonstrate the capabilities of our framework to improve MPC convergence and runtime. Compared to purely optimization-based approaches, results show that our approach can improve trajectory generation performance by up to 75%, reduce the number of solver iterations by up to 45%, and improve overall MPC runtime by 7x without loss in performance.
- T. Guffanti, D. Gammelli, S. D’Amico, and M. Pavone, “Transformers for trajectory optimization with application to spacecraft rendezvous,” in IEEE Aerospace Conference, 2024.
- J. Alonso-Mora, S. Samaranayake, A. Wallar, E. Frazzoli, and D. Rus, “Advances in trajectory optimization for space vehicle control,” Annual Reviews in Control, vol. 52, 2021.
- D. Mellinger and V. Kumar, “Minimum snap trajectory generation and control for quadrotors,” in Proc. IEEE Conf. on Robotics and Automation, 2011.
- M. Mohanan and A. Salgoankar, “A survey of robotic motion planning in dynamic environments,” Robotics and Autonomous Systems, vol. 100, 2018.
- M. Morari and J. H. Lee, “Model predictive control: past, present and future,” Computers & Chemical Engineering, vol. 23, 1999.
- J. T. Betts, “Survey of numerical methods for trajectory optimization,” AIAA Journal of Guidance, Control, and Dynamics, vol. 21, 1998.
- S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” Journal of Machine Learning Research, vol. 17, no. 1, pp. 1–40, 2016.
- C. Wu, A. Rajeswaran, Y. Duan, V. Kumar, A. Bayen, S. Kakade, I. Mordatch, and P. Abbeel, “Variance reduction for policy gradient with action-dependent factorized baselines,” in Int. Conf. on Learning Representations, 2018.
- J. Zhang, C. Ni, Z. Yu, C. Szepesvari, and M. Wang, “On the convergence and sample efficiency of variance-reduced policy gradient method,” in Conf. on Neural Information Processing Systems, 2021.
- B. Ichter, J. Harrison, and M. Pavone, “Learning sampling distributions for robot motion planning,” in Proc. IEEE Conf. on Robotics and Automation, 2018.
- S. Bansal, V. Tolani, S. Gupta, J. Malik, and C. Tomlin, “Combining optimal control and learning for visual navigation in novel environments,” in Conf. on Robot Learning, 2020.
- T. Lew, S. Singh, M. Prats, J. Bingham, J. Weisz et al., “Robotic table wiping via reinforcement learning and whole-body trajectory optimization,” in Proc. IEEE Conf. on Robotics and Automation, 2023.
- L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, “Decision transformer: Reinforcement learning via sequence modeling,” in Conf. on Neural Information Processing Systems, 2021.
- M. Janner, Q. Li, and S. Levine, “Offline reinforcement learning as one big sequence modeling problem,” in Conf. on Neural Information Processing Systems, 2021.
- C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” in Robotics: Science and Systems, 2023.
- S. Banerjee, T. Lew, R. Bonalli, A. Alfaadhel, I. A. Alomar, H. M. Shageer, and M. Pavone, “Learning-based warm-starting for fast sequential convex programming and trajectory optimization,” in IEEE Aerospace Conference, 2020.
- A. Cauligi, P. Culbertson, B. Stellato, D. Bertsimas, M. Schwager, and M. Pavone, “Learning mixed-integer convex optimization strategies for robot planning and control,” in Proc. IEEE Conf. on Decision and Control, 2020.
- S. W. Chen, T. Wang, N. Atanasov, V. Kumar, and M. Morari, “Large scale model predictive control with neural networks and primal active sets,” Automatica, vol. 135, p. 109947, 2022.
- B. Amos, I. D. J. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable MPC for end-to-end planning and control,” in Conf. on Neural Information Processing Systems, 2018.
- A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in Conf. on Neural Information Processing Systems, 2019.
- N. Rajaraman, L. F. Yang, J. Jiao, and K. Ramchandran, “Toward the fundamental limits of imitation learning,” in Conf. on Neural Information Processing Systems, 2020.
- S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proc. Int. Conf. on Artificial Intelligence and Statistics, 2011.
- G. Alcan and V. Kyrki, “Differential dynamic programming with nonlinear safety constraints under system uncertainties,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1760–1767, 2022.
- A. W. Koenig, T. Guffanti, and S. D’Amico, “New state transition matrices for spacecraft relative motion in perturbed orbits,” Journal of Guidance, Control, and Dynamics, vol. 40, no. 7, pp. 1749–1768, 2017.
- H. Nguyen, M. Kamel, K. Alexis, and R. Siegwart, “Model predictive control for micro aerial vehicles: A survey,” in 2021 European Control Conference (ECC), 2021, pp. 1556–1563.