Differentiable Model Predictive Control

Updated 28 March 2026

Differentiable Model Predictive Control is a framework that makes the MPC solution mapping differentiable, allowing gradient-based tuning of controller parameters.
It utilizes methods like KKT sensitivity analysis and unrolled backpropagation to integrate classical control with neural network components in complex dynamics.
The approach finds applications in robotics, UAVs, building automation, and reinforcement learning, offering scalability and enhanced safety in constrained systems.

Differentiable Model Predictive Control (MPC) refers to a family of control and learning methods wherein the solution mapping of the Model Predictive Control policy is made differentiable with respect to problem parameters, states, or embedded model parameters. This differentiability can be leveraged for efficient policy learning, cost function tuning, system identification, and end-to-end reinforcement or imitation learning. Differentiable MPC integrates automatic differentiation through the entire closed-loop control graph—comprising the MPC optimization layer, possible neural network components, and system models—enabling direct and scalable optimization of controller parameters by gradient-based methods. This approach is particularly important for systems exhibiting nonlinear, constrained, or partially unknown dynamics, and it enables advanced performance tuning in both real and simulated environments (Drgona et al., 2020, Zuliani et al., 14 Nov 2025).

1. Foundational Principles

Differentiable MPC formalizes the solution of the classic MPC problem as a parameter-to-solution map that can be differentiated via the implicit function theorem, KKT system sensitivity analysis, or unrolling of a parameterized computation graph. The general problem is stated as

$\min_{u_{0:N-1}} \sum_{k=0}^{N-1} \ell(x_k,u_k) + V_f(x_N) \quad \text{s.t.} \quad x_{k+1}=f(x_k,u_k),\ x_0\ \text{given},\ x_k\in \mathcal{X},\ u_k\in \mathcal{U}$

for known or learned dynamics $f$ . Decision variables can encompass trajectories and control sequences, and parameters may include system model weights, cost matrices, constraint bounds, or exogenous signals.

Differentiability is crucial for embedding the MPC controller as a modular, trainable block in modern optimization and learning pipelines, such as policy-gradient reinforcement learning, supervised imitation learning, and closed-loop cost-function optimization (Drgona et al., 2020, Amos et al., 2018).

2. Key Methodologies for Differentiability

Differentiable MPC admits several algorithmic realizations depending on problem structure, constraint type, and presence of learning components:

a. KKT and Implicit Function Theorem Differentiation

For convex (typically quadratic) MPC problems with linear or linearized dynamics and constraints, the MPC solution is the optimizer of a quadratic program whose KKT conditions are differentiable almost everywhere. The sensitivities of the solution with respect to parameters are computed by differentiating the KKT system, yielding gradients such as

$\frac{\partial u^*}{\partial \theta} = - K_{z,z}^{-1} K_{z,\theta},$

where $K_{z,z}$ and $K_{z,\theta}$ are the Jacobians of KKT residuals with respect to solution and parameter (Zuliani et al., 14 Nov 2025, Amos et al., 2018, Romero et al., 2023, Tao et al., 2023). Modern software frameworks (cvxpylayers, OptNet) implement this efficiently.

b. Unroll-and-Backpropagation Approaches for Neural Policies and Learned Models

For settings with unknown or partially known system dynamics, differentiable MPC can be achieved by composing a neural state-space model with a neural policy network into an end-to-end, differentiable computation graph. The full loss, including tracking error, control variation, and differentiable soft constraints, is backpropagated through the unrolled N-step rollout using automatic differentiation (Drgona et al., 2020, Drgona et al., 2021, Drgona et al., 2020, Viljoen et al., 2024).

c. Sequential Quadratic Programming and Nonlinear Problems

For nonlinear MPC (NMPC), the optimization is solved by sequential quadratic programming (SQP), and the backward pass involves differentiating through each quadratic subproblem and optionally the nonlinear model linearizations. Differentiability can be ensured by careful regularization or by designing surrogate problems that guarantee well-posed gradients even at non-smooth points (Adabag et al., 7 Oct 2025, Zuliani et al., 16 Sep 2025, Tao et al., 2023).

d. Bilevel and Bi-level Learning

In end-to-end pipelines that combine perception or state estimation networks with a lower-level MPC solve, the system is formulated as a bi-level optimization in which the upper-level learns or adapts model or cost parameters by differentiating through the optimality conditions (implicit function theorem) of the inner MPC problem (He et al., 17 Apr 2025).

3. Applications and Empirical Results

Differentiable MPC enables a spectrum of applications across classical control, robotics, and learning-based systems:

Unknown Nonlinear Systems: Differentiable Predictive Control (DPC) learns a neural state-space model from data and optimizes a neural MPC controller, achieving superior tracking and constraint satisfaction with online evaluation scaling linearly in horizon length and parameter count. Embedded deployments on platforms such as Raspberry Pi have demonstrated real-time performance and memory advantages over explicit MPC (Drgona et al., 2020).
Robotics and UAVs: Self-supervised training of inertial odometry and attitude control via differentiable MPC achieves lowest settling time and RMSE compared to modular or non-differentiable baselines, with sample efficiency and disturbance rejection validated on Gazebo and hardware (He et al., 17 Apr 2025).
Reinforcement Learning Integration: Actor-critic architectures with embedded differentiable MPC actors demonstrate improved robustness and sample efficiency compared to pure RL and pure tracking controllers, with real-time viability on drone racing tasks (Romero et al., 2023).
Imitation and Cost Learning: End-to-end learning of both cost and dynamics parameters through differentiable MPC has shown data efficiency and improved performance over non-differentiable baselines in simulated environments (Amos et al., 2018).
Tactile Robotic Manipulation: End-to-end tactile-reactive grasping controllers integrate a differentiable QP-MPC layer with deep neural tactile encoders (LeTac-MPC), yielding improvements in force efficiency, generalization to novel objects, and rapid convergence, compared to PD and conventional MPC (Xu et al., 2024).
Building Automation and Energy Management: DPC frameworks deployed for building thermal control leverage neural state-space models and explicit DPC policies that scale to large, nonlinear plants and allow direct backpropagation of constraint and comfort losses (Drgona et al., 2021).
Online Parameter and Policy Learning under Uncertainty: Closed-loop policy optimization with differentiable MPC and recursive system identification produces provably convergent alternating-model/policy updates with probabilistic guarantees under mild assumptions (Zuliani et al., 5 Jan 2026).
GPU-Accelerated Learning and Real-Time Control: Differentiable MPC solvers fully exploiting GPU parallelism (via block-tridiagonal Schur complement factorization and preconditioned conjugate gradient schemes) achieve 4–7× speedups over CPU iLQR or GPU Riccati baselines for both forward and backward passes in RL and imitation learning scenarios (Adabag et al., 7 Oct 2025).

4. Constraint Handling and Safety

Differentiable MPC methods address constraint satisfaction via several mechanisms:

Soft Penalty Functions: State and input constraints are converted into differentiable penalties (ReLU, exponential, or log-barrier) applied to the loss function. This approach is compatible with autodiff backpropagation and avoids the need for complex region-partitioning as in explicit MPC (Drgona et al., 2020, Drgona et al., 2020).
Hard Constraints in Optimization Layer: For quadratic or general convex settings, constraints are encoded directly in the QP, and KKT-based differentiation propagates through the active-set regions. For NMPC or contact-rich tasks, interior-point methods with exact Jacobians (via AD tools such as CasADi) are used (Haninger et al., 2023, Adabag et al., 7 Oct 2025).
Barrier Functions and Predictive Safety Filters: Integration of control barrier functions and predictive safety filters with DPC improves robustness, ensuring forward invariance of safe sets with only occasional low-level QP solves, while maintaining the main inference speed of the neural policy (Cortez et al., 2022, Viljoen et al., 2024).
Robustness to Model Mismatch: Hybrid zero-order/model-based gradient approaches and explicit robustification (e.g., tube-MPC) provide robustness to model errors and external disturbances, with success rates under previously challenging or adversarial initializations (Zuliani et al., 14 Nov 2025, Oshin et al., 2023).

5. Scalability, Computational Aspects, and Limitations

Empirical evaluations consistently show that differentiable MPC methods scale more favorably than traditional explicit MPC, particularly for high-dimensional, long-horizon problems. Key metrics from (Drgona et al., 2020):

Approach	Policy Size	Memory Footprint	Region/Parameter Scaling	Online CPU Time
DPC (NN Policy)	1,845–3,855 weights	~13–21 kB	Linear in horizon	0.37 ms avg
Explicit MPC	Region map (polyhedral counts 108–5,333)	0.6–65 MB	Exponential in horizon	0.46 ms avg

Offline construction time for DPC grows slowly with horizon, while explicit MPC quickly becomes infeasible.

Limitations include:

Absence of formal closed-loop stability proofs for generic neural-parametrized DPC (although extensions with Lyapunov functions or terminal costs are possible) (Drgona et al., 2020).
Reliance on the accuracy and conditioning of learned neural models; gradients may vanish in poorly conditioned regimes, which motivates system decomposition, two-stage training, or physics-informed architectures (Viljoen et al., 2024, Drgona et al., 2021).
Potential sub-optimality and lack of hard guarantees on constraint satisfaction, mitigated partially by external safety filters or hybrid policy design (Cortez et al., 2022).
Computational load for very large parameter spaces or when differentiating through SQP/NLP layers, alleviated via tailored regularization or GPU-optimized solvers (Adabag et al., 7 Oct 2025, Zuliani et al., 16 Sep 2025).
No universal method for global invariance or stability in DPC plus safety filter frameworks; further research is needed for rigorous infinite-horizon properties (Viljoen et al., 2024).

6. Extensions and Research Directions

Several promising directions and extensions are under active investigation:

Joint Dynamics and Policy Learning: Wrapping system identification and DPC into a unified differentiable loop to enable rapid adaptation on unknown or changing systems (Drgona et al., 2020, Zuliani et al., 5 Jan 2026).
Structured and Physics-Informed Architectures: Replacing generic feedforward networks with SINDy, graph neural networks, or prior-informed blocks for improved generalization, interpretability, and sample efficiency (Drgona et al., 2020, Drgona et al., 2021).
Advanced Gradient Estimation: Blending model-based and model-free (zeroth-order) gradient signals for robust policy learning under model mismatch and in non-smooth objective landscapes, with provable convergence (Zuliani et al., 14 Nov 2025).
Automated Safety Set Construction: Leveraging data-driven or learned safe regions and event-triggered safety filtering for tractable yet effective safety certificates in high-dimensional domains (Viljoen et al., 2024).
Bilevel Optimization: End-to-end trainable closed-loop architectures that embed perception or estimation networks, bridging classical pipeline modularity with learning-based adaptivity (He et al., 17 Apr 2025).
Scalable and Real-Time Solvers: Further acceleration and scalability via parallel architectures and custom GPU implementations, facilitating deployment in high-frequency and large-scale applications (Adabag et al., 7 Oct 2025).

Overall, differentiable model predictive control offers a principled, scalable, and versatile foundation for integrating modern learning, safety, and robust control throughout the stack of real-world cyber-physical systems and robotics. The method bridges classic principles of receding-horizon control with the full expressive and computational power of gradient-based learning (Drgona et al., 2020, Amos et al., 2018, Zuliani et al., 14 Nov 2025).