Differentiable Predictive Control (DPC)

Updated 19 November 2025

Differentiable Predictive Control (DPC) is a framework that synthesizes explicit neural control policies by embedding finite-horizon optimal control problems into end-to-end differentiable graphs.
It leverages TI-DeepONet to accurately model PDE dynamics, preserving key Markovian and causal structures essential for stable, long-horizon predictions.
DPC eliminates the need for online optimization by training policies offline through direct gradient descent, achieving rapid and constraint-adherent control in complex scenarios.

Differentiable Predictive Control (DPC) is a framework for synthesizing explicit, parametric control policies by embedding the entire closed-loop, finite-horizon optimal control problem into an end-to-end differentiable computational graph. DPC leverages neural network parameterizations and modern automatic differentiation to optimize policy parameters purely via direct policy gradients with respect to the expected closed-loop cost and penalties, thereby avoiding the need for online optimization or imitation from expert controllers. In the context of high-dimensional and infinite-dimensional control, such as PDE-constrained settings, DPC incorporates operator-learning architectures—most notably, Time-Integrated Deep Operator Networks (TI-DeepONets)—to provide accurate, stable, and fully differentiable surrogates for complex, temporally-evolving physical dynamics (Sarkar et al., 12 Nov 2025).

1. PDE-Constrained Optimal Control and the DPC Objective

DPC addresses the general class of PDE-constrained optimal control problems formulated over a spatio-temporal domain $\Omega \subset \mathbb{R}^d$ , $t \in [0,T]$ , where the state $u(\mathbf{x},t)$ evolves according to

$\frac{\partial u}{\partial t}(\mathbf{x},t) = \mathcal{F}\left(t,\mathbf{x},u,\nabla u,\nabla^2u, \dots, a(\mathbf{x},t)\right),$

subject to initial and boundary conditions, and where $a(\mathbf{x},t)$ is a distributed control input. The optimal control problem is to minimize

$J[u,a] = \int_0^T \int_\Omega \ell(u, a, \xi)\,d\mathbf{x}dt + \int_\Omega \ell_T(u(T,\mathbf{x}))\,d\mathbf{x}$

while enforcing dynamics, state constraints $h(u, \xi) \le 0$ , and control constraints $g(a, \xi) \le 0$ .

Traditional approaches discretize the PDE and employ nonlinear (model predictive) optimization in an online fashion, incurring prohibitive computational cost, especially for high-resolution or long-horizon settings. DPC, in contrast, learns both a parametric neural policy $a = v_\phi(\cdot)$ and a differentiable surrogate $\mathcal{F}_\theta$ , optimizing policy parameters $\phi$ offline by backpropagating the composite loss through the surrogate model—thus bypassing the need for any online optimization step (Sarkar et al., 12 Nov 2025).

2. Time-Integrated DeepONet: Markovian Surrogate Modeling for Dynamics

A cornerstone of PDE-DPC is the use of Time-Integrated Deep Operator Networks (TI-DeepONets), which directly model the temporal derivatives of the PDE solution and embed them within classical time integrators, preserving the Markovian and causal structure of the original dynamics. Rather than autoregressively predicting the solution, the TI-DeepONet parameterization is trained to approximate

$\frac{\partial u}{\partial t} \approx \mathcal{G}_\theta\left(t, \mathbf{x}, u, \nabla u, \nabla^2u, \dots, a \right)$

After spatial discretization, TI-DeepONet uses a dual-branch architecture to encode the discretized state and control, combining them with a trunk encoding of the spatial points. The output is then integrated in time by standard ODE solvers (e.g., explicit Runge-Kutta), substantially reducing long-horizon error accumulation and ensuring correct temporal causality [(Sarkar et al., 12 Nov 2025), Eq. (4)-(5)].

Distinctive features:

Markovian structure: By targeting $\partial_t u$ , predictions are conditional only on the current inputs, not historical rollouts.
Error stability: Coupling with established integrators controls drift and error propagation over long horizons, which is especially challenging in operator learning for PDEs.

3. DPC Framework: Policy Parameterization, Training Pipeline, and Gradient Computation

In the discrete-time parametric optimal control setting, DPC introduces a feed-forward neural policy $\pi_{\mathbf{W}}$ mapping current state-distribution and task (parameter) features to control actions, with all components (policy, dynamics surrogate, integrator) fully differentiable: $\min_{\mathbf{W}} \mathbb{E}_{u_0, \xi} \Bigg[ \sum_{k=0}^{N-1} \sum_{i=1}^{n_x} \ell(u_k^i, a_k^i, \xi_k^i)\Delta x + \sum_{i=1}^{n_x} \ell_N(u_N^i)\Delta x \Bigg]$ subject to

$\mathbf{u}_{k+1} = \mathrm{ODESolve}\bigl(\mathcal{G}_\theta(t_k, \mathbf{u}_k, \mathbf{a}_k)\bigr), \quad \mathbf{a}_k = \pi_{\mathbf{W}}(\mathbf{u}_k, \xi_k)$

and soft state and input constraints.

The composite loss includes penalties for constraint violations using ReLU activations, enabling gradient-based optimization by automatic differentiation: $\mathcal{L}_{\mathrm{DPC}} = \text{(stage costs)} + Q_h \|\mathrm{ReLU}(h(u, \xi))\|^2 + Q_g \|\mathrm{ReLU}(g(a, \xi))\|^2 + \text{(terminal cost)}$

Gradient propagation flows through

the neural policy,
the TI-DeepONet surrogate,
the ODE integrator steps, yielding end-to-end gradients w.r.t. policy parameters ( $\nabla_{\mathbf{W}} \mathcal{L}_{\mathrm{DPC}}$ ) computed efficiently via modern autodiff frameworks [(Sarkar et al., 12 Nov 2025), Eq. (8)-(9)].

Offline training follows a two-stage pipeline:

Pretrain TI-DeepONet by regressing to ground-truth temporal derivatives on simulated data ( $\{u_k, a_k\} \mapsto \partial_t u_k$ ).
Freeze the surrogate and train the policy by unrolling the predictive dynamics and optimizing the aggregate loss over sampled initial conditions and parameters.

4. Empirical Performance and Canonical PDE Benchmarks

DPC with TI-DeepONet has been validated on canonical 1D PDE paradigms:

Heat equation: Target tracking with random-field initial data achieves $L_2$ surrogate error $\sim 1.4 \times 10^{-2}$ , with terminal closed-loop errors of order $10^{-4}$ .
Inviscid Burgers equation: Shock mitigation (curvature minimization) with $77.3\%$ curvature reduction over the uncontrolled baselines and surrogate error $9.24 \times 10^{-2}$ .
Fisher-KPP equation: Reaction–diffusion control, achieving accurate target density tracking ( $L_2$ surrogate error $1.18 \times 10^{-2}$ ), terminal errors $\mathcal{O}(10^{-4})$ .

Policies trained via TI-DeepONet transfer nearly seamlessly to high-fidelity (finite-difference) solvers, with minimal performance loss. Closed-loop constraint satisfaction and generalization across initial-condition and parameter distributions are empirically observed. Compared to standard discretize-then-solve online nonlinear MPC, DPC offline compute is dominated by operator learning and policy training, while online control is a single forward pass through the surrogate and the neural policy (Sarkar et al., 12 Nov 2025).

DPC has been extended across several modeling paradigms:

Parametric mixed-integer problems: Relaxed differentiable rounding strategies permit the handling of integer constraints in DPC policy output, validated on hybrid thermal systems with submillisecond inference and sub-1% suboptimality (Boldocký et al., 24 Jun 2025).
Zero-shot adaptive control: Function-encoder–based DPC (FE-DPC) enables real-time adaptation to unseen continuous parameterizations of nonlinear dynamics via basis-encoded neural ODEs, with instantaneous closed-form policy adaptation (Iqbal et al., 7 Nov 2025).
Stochastic settings: Stochastic and chance-constrained DPC incorporates sampled expectation and empirical surrogates for performance guarantees under uncertainty (Drgoňa et al., 2022).
Explicit safety guarantees: Joint DPC and control barrier function frameworks furnish deterministic, sampled-data safety guarantees with minimal online backup solves (Cortez et al., 2022).

6. Scalability, Limitations, and Future Directions

DPC's scalability is anchored by efficient surrogate modeling (operator learning or finite-dimensional approximate dynamics) and purely offline neural policy training. Key advantages include:

Elimination of online nonlinear optimization.
Compatibility with high-dimensional and nontrivial constraint structures (via penalty formulations).
Rapid online evaluation (single NN pass + surrogate rollout).
Empirical ability to generalize to unseen initial states and parameterizations.

Limitations and future pathways articulated in the literature include:

High-dimensional spatial domains ( $d>1$ ): increased memory and compute requirements suggest extensions via domain decomposition, multigrid, or parallelized operator architectures.
Quantified uncertainty: Incorporation of Bayesian or probabilistic surrogates for robust control and formal safety justification under model error.
Physics-informed regularizers: Residual-based losses during surrogate training to tighten adherence to physical conservation laws.
Hardware-efficient deployment: Suitability for embedded platforms due to lightweight inference pathways (Sarkar et al., 12 Nov 2025).

Summary Table: Core Features of TI-DeepONet–based DPC for PDEs

Component	Role	Key Equation or Architecture
TI-DeepONet	Differentiable PDE surrogate	$\partial_t u \approx \mathcal{G}_\theta(\cdot)$
ODE Solver	Integrates surrogate's outputs	E.g., 4th-order Runge-Kutta
Neural Policy	Parametric explicit control mapping	$a_k = \pi_{\mathbf{W}}(u_k, \xi_k)$
Loss & Penalties	Tracks objectives and soft constraints	Eq. (8): $\mathcal{L}_{\mathrm{DPC}}$
Training Pipeline	Offline pretraining + policy rollout	Algorithm 1–2 in (Sarkar et al., 12 Nov 2025)

7. Impact and Outlook

DPC, especially as instantiated with TI-DeepONet for PDE control, represents an overview of neural operator learning, self-supervised predictive control, and differentiable programming. It enables scalable, generalizable, and rapid explicit policy synthesis for high-dimensional, nonlinear dynamical systems previously intractable for traditional MPC. Ongoing research focuses on extending the approach to higher spatial dimensions, stochastic settings, and physically-constrained environments, as well as further reducing surrogate error and increasing theoretical guarantees for closed-loop performance (Sarkar et al., 12 Nov 2025).