Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning (2306.09852v6)
Abstract: An open research question in robotics is how to combine the benefits of model-free reinforcement learning (RL) -- known for its strong task performance and flexibility in optimizing general reward formulations -- with the robustness and online replanning capabilities of model predictive control (MPC). This paper provides an answer by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an actor-critic RL framework. This integration allows for short-term predictive optimization of control actions through MPC, while leveraging RL for end-to-end learning and exploration over longer horizons. Through various ablation studies, we expose the benefits of the proposed approach: it achieves better out-of-distribution behaviour, better robustness to changes in the dynamics and improved sample efficiency. Additionally, we conduct an empirical analysis that reveals a relationship between the critic's learned value function and the cost function of the differentiable MPC, providing a deeper understanding of the interplay between the critic's value and the MPC cost functions. Finally, we validate our method in the drone racing task in various tracks, in both simulation and the real world. Our method achieves the same superhuman performance as state-of-the-art model-free RL, showcasing speeds of up to 21 m/s. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out of distribution behavior.
- Angel Romero (18 papers)
- Yunlong Song (26 papers)
- Davide Scaramuzza (190 papers)
- Elie Aljalbout (21 papers)