Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 109 tok/s
Gemini 3.0 Pro 52 tok/s Pro
Gemini 2.5 Flash 159 tok/s Pro
Kimi K2 203 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Differentiable Predictive Control (DPC)

Updated 19 November 2025
  • Differentiable Predictive Control (DPC) is a framework that synthesizes explicit neural control policies by embedding finite-horizon optimal control problems into end-to-end differentiable graphs.
  • It leverages TI-DeepONet to accurately model PDE dynamics, preserving key Markovian and causal structures essential for stable, long-horizon predictions.
  • DPC eliminates the need for online optimization by training policies offline through direct gradient descent, achieving rapid and constraint-adherent control in complex scenarios.

Differentiable Predictive Control (DPC) is a framework for synthesizing explicit, parametric control policies by embedding the entire closed-loop, finite-horizon optimal control problem into an end-to-end differentiable computational graph. DPC leverages neural network parameterizations and modern automatic differentiation to optimize policy parameters purely via direct policy gradients with respect to the expected closed-loop cost and penalties, thereby avoiding the need for online optimization or imitation from expert controllers. In the context of high-dimensional and infinite-dimensional control, such as PDE-constrained settings, DPC incorporates operator-learning architectures—most notably, Time-Integrated Deep Operator Networks (TI-DeepONets)—to provide accurate, stable, and fully differentiable surrogates for complex, temporally-evolving physical dynamics (Sarkar et al., 12 Nov 2025).

1. PDE-Constrained Optimal Control and the DPC Objective

DPC addresses the general class of PDE-constrained optimal control problems formulated over a spatio-temporal domain ΩRd\Omega \subset \mathbb{R}^d, t[0,T]t \in [0,T], where the state u(x,t)u(\mathbf{x},t) evolves according to

ut(x,t)=F(t,x,u,u,2u,,a(x,t)),\frac{\partial u}{\partial t}(\mathbf{x},t) = \mathcal{F}\left(t,\mathbf{x},u,\nabla u,\nabla^2u, \dots, a(\mathbf{x},t)\right),

subject to initial and boundary conditions, and where a(x,t)a(\mathbf{x},t) is a distributed control input. The optimal control problem is to minimize

J[u,a]=0TΩ(u,a,ξ)dxdt+ΩT(u(T,x))dxJ[u,a] = \int_0^T \int_\Omega \ell(u, a, \xi)\,d\mathbf{x}dt + \int_\Omega \ell_T(u(T,\mathbf{x}))\,d\mathbf{x}

while enforcing dynamics, state constraints h(u,ξ)0h(u, \xi) \le 0, and control constraints g(a,ξ)0g(a, \xi) \le 0.

Traditional approaches discretize the PDE and employ nonlinear (model predictive) optimization in an online fashion, incurring prohibitive computational cost, especially for high-resolution or long-horizon settings. DPC, in contrast, learns both a parametric neural policy a=vϕ()a = v_\phi(\cdot) and a differentiable surrogate Fθ\mathcal{F}_\theta, optimizing policy parameters ϕ\phi offline by backpropagating the composite loss through the surrogate model—thus bypassing the need for any online optimization step (Sarkar et al., 12 Nov 2025).

2. Time-Integrated DeepONet: Markovian Surrogate Modeling for Dynamics

A cornerstone of PDE-DPC is the use of Time-Integrated Deep Operator Networks (TI-DeepONets), which directly model the temporal derivatives of the PDE solution and embed them within classical time integrators, preserving the Markovian and causal structure of the original dynamics. Rather than autoregressively predicting the solution, the TI-DeepONet parameterization is trained to approximate

utGθ(t,x,u,u,2u,,a)\frac{\partial u}{\partial t} \approx \mathcal{G}_\theta\left(t, \mathbf{x}, u, \nabla u, \nabla^2u, \dots, a \right)

After spatial discretization, TI-DeepONet uses a dual-branch architecture to encode the discretized state and control, combining them with a trunk encoding of the spatial points. The output is then integrated in time by standard ODE solvers (e.g., explicit Runge-Kutta), substantially reducing long-horizon error accumulation and ensuring correct temporal causality [(Sarkar et al., 12 Nov 2025), Eq. (4)-(5)].

Distinctive features:

  • Markovian structure: By targeting tu\partial_t u, predictions are conditional only on the current inputs, not historical rollouts.
  • Error stability: Coupling with established integrators controls drift and error propagation over long horizons, which is especially challenging in operator learning for PDEs.

3. DPC Framework: Policy Parameterization, Training Pipeline, and Gradient Computation

In the discrete-time parametric optimal control setting, DPC introduces a feed-forward neural policy πW\pi_{\mathbf{W}} mapping current state-distribution and task (parameter) features to control actions, with all components (policy, dynamics surrogate, integrator) fully differentiable: minWEu0,ξ[k=0N1i=1nx(uki,aki,ξki)Δx+i=1nxN(uNi)Δx]\min_{\mathbf{W}} \mathbb{E}_{u_0, \xi} \Bigg[ \sum_{k=0}^{N-1} \sum_{i=1}^{n_x} \ell(u_k^i, a_k^i, \xi_k^i)\Delta x + \sum_{i=1}^{n_x} \ell_N(u_N^i)\Delta x \Bigg] subject to

uk+1=ODESolve(Gθ(tk,uk,ak)),ak=πW(uk,ξk)\mathbf{u}_{k+1} = \mathrm{ODESolve}\bigl(\mathcal{G}_\theta(t_k, \mathbf{u}_k, \mathbf{a}_k)\bigr), \quad \mathbf{a}_k = \pi_{\mathbf{W}}(\mathbf{u}_k, \xi_k)

and soft state and input constraints.

The composite loss includes penalties for constraint violations using ReLU activations, enabling gradient-based optimization by automatic differentiation: LDPC=(stage costs)+QhReLU(h(u,ξ))2+QgReLU(g(a,ξ))2+(terminal cost)\mathcal{L}_{\mathrm{DPC}} = \text{(stage costs)} + Q_h \|\mathrm{ReLU}(h(u, \xi))\|^2 + Q_g \|\mathrm{ReLU}(g(a, \xi))\|^2 + \text{(terminal cost)}

Gradient propagation flows through

  • the neural policy,
  • the TI-DeepONet surrogate,
  • the ODE integrator steps, yielding end-to-end gradients w.r.t. policy parameters (WLDPC\nabla_{\mathbf{W}} \mathcal{L}_{\mathrm{DPC}}) computed efficiently via modern autodiff frameworks [(Sarkar et al., 12 Nov 2025), Eq. (8)-(9)].

Offline training follows a two-stage pipeline:

  1. Pretrain TI-DeepONet by regressing to ground-truth temporal derivatives on simulated data ({uk,ak}tuk\{u_k, a_k\} \mapsto \partial_t u_k).
  2. Freeze the surrogate and train the policy by unrolling the predictive dynamics and optimizing the aggregate loss over sampled initial conditions and parameters.

4. Empirical Performance and Canonical PDE Benchmarks

DPC with TI-DeepONet has been validated on canonical 1D PDE paradigms:

  • Heat equation: Target tracking with random-field initial data achieves L2L_2 surrogate error 1.4×102\sim 1.4 \times 10^{-2}, with terminal closed-loop errors of order 10410^{-4}.
  • Inviscid Burgers equation: Shock mitigation (curvature minimization) with 77.3%77.3\% curvature reduction over the uncontrolled baselines and surrogate error 9.24×1029.24 \times 10^{-2}.
  • Fisher-KPP equation: Reaction–diffusion control, achieving accurate target density tracking (L2L_2 surrogate error 1.18×1021.18 \times 10^{-2}), terminal errors O(104)\mathcal{O}(10^{-4}).

Policies trained via TI-DeepONet transfer nearly seamlessly to high-fidelity (finite-difference) solvers, with minimal performance loss. Closed-loop constraint satisfaction and generalization across initial-condition and parameter distributions are empirically observed. Compared to standard discretize-then-solve online nonlinear MPC, DPC offline compute is dominated by operator learning and policy training, while online control is a single forward pass through the surrogate and the neural policy (Sarkar et al., 12 Nov 2025).

DPC has been extended across several modeling paradigms:

  • Parametric mixed-integer problems: Relaxed differentiable rounding strategies permit the handling of integer constraints in DPC policy output, validated on hybrid thermal systems with submillisecond inference and sub-1% suboptimality (Boldocký et al., 24 Jun 2025).
  • Zero-shot adaptive control: Function-encoder–based DPC (FE-DPC) enables real-time adaptation to unseen continuous parameterizations of nonlinear dynamics via basis-encoded neural ODEs, with instantaneous closed-form policy adaptation (Iqbal et al., 7 Nov 2025).
  • Stochastic settings: Stochastic and chance-constrained DPC incorporates sampled expectation and empirical surrogates for performance guarantees under uncertainty (Drgoňa et al., 2022).
  • Explicit safety guarantees: Joint DPC and control barrier function frameworks furnish deterministic, sampled-data safety guarantees with minimal online backup solves (Cortez et al., 2022).

6. Scalability, Limitations, and Future Directions

DPC's scalability is anchored by efficient surrogate modeling (operator learning or finite-dimensional approximate dynamics) and purely offline neural policy training. Key advantages include:

  • Elimination of online nonlinear optimization.
  • Compatibility with high-dimensional and nontrivial constraint structures (via penalty formulations).
  • Rapid online evaluation (single NN pass + surrogate rollout).
  • Empirical ability to generalize to unseen initial states and parameterizations.

Limitations and future pathways articulated in the literature include:

  • High-dimensional spatial domains (d>1d>1): increased memory and compute requirements suggest extensions via domain decomposition, multigrid, or parallelized operator architectures.
  • Quantified uncertainty: Incorporation of Bayesian or probabilistic surrogates for robust control and formal safety justification under model error.
  • Physics-informed regularizers: Residual-based losses during surrogate training to tighten adherence to physical conservation laws.
  • Hardware-efficient deployment: Suitability for embedded platforms due to lightweight inference pathways (Sarkar et al., 12 Nov 2025).

Summary Table: Core Features of TI-DeepONet–based DPC for PDEs

Component Role Key Equation or Architecture
TI-DeepONet Differentiable PDE surrogate tuGθ()\partial_t u \approx \mathcal{G}_\theta(\cdot)
ODE Solver Integrates surrogate's outputs E.g., 4th-order Runge-Kutta
Neural Policy Parametric explicit control mapping ak=πW(uk,ξk)a_k = \pi_{\mathbf{W}}(u_k, \xi_k)
Loss & Penalties Tracks objectives and soft constraints Eq. (8): LDPC\mathcal{L}_{\mathrm{DPC}}
Training Pipeline Offline pretraining + policy rollout Algorithm 1–2 in (Sarkar et al., 12 Nov 2025)

7. Impact and Outlook

DPC, especially as instantiated with TI-DeepONet for PDE control, represents an overview of neural operator learning, self-supervised predictive control, and differentiable programming. It enables scalable, generalizable, and rapid explicit policy synthesis for high-dimensional, nonlinear dynamical systems previously intractable for traditional MPC. Ongoing research focuses on extending the approach to higher spatial dimensions, stochastic settings, and physically-constrained environments, as well as further reducing surrogate error and increasing theoretical guarantees for closed-loop performance (Sarkar et al., 12 Nov 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Differentiable Predictive Control (DPC).