Diffusion Model Predictive Control

Updated 23 October 2025

Diffusion Model Predictive Control is a predictive framework that integrates stochastic and generative diffusion processes to manage uncertainties and non-Gaussian disturbances.
It employs techniques like importance sampling, exponential time coarsening, and neural diffusion models to enhance robustness, scalability, and adaptability.
D-MPC enables constraint enforcement, multimodal trajectory planning, and real-time optimization, demonstrating successful applications in robotics, energy systems, and multiscale control.

Diffusion Model Predictive Control (D-MPC) refers to a class of predictive control methodologies where diffusion processes—both stochastic (e.g., jump diffusion for noise modeling) and generative (e.g., trajectory-level probabilistic models)—are integrally embedded in the control synthesis or optimization process. The terminology is heterogeneous in the literature, encompassing information-theoretic MPC for systems with jump diffusions (Wang et al., 2018), time-coarsening “diffusing-horizon” MPC for multiscale optimization (Shin et al., 2020), and recent neural generative models (diffusion models) directly used for trajectory/action prediction in MPC with adaptation, uncertainty, and multi-modality (Zhou et al., 7 Oct 2024, Huang et al., 11 Dec 2024, Julbe et al., 6 Apr 2025, Huang et al., 5 Oct 2025). D-MPC techniques generalize classical MPC by leveraging diffusion-based modeling to handle non-Gaussian disturbances, adapt system performance to novel objectives, and enable robust, scalable control in high-dimensional and uncertain settings.

1. Information-Theoretic MPC with Stochastic Diffusion and Jump Processes

A principal thread of D-MPC, introduced in (Wang et al., 2018), generalizes model predictive control to systems subject to both Gaussian noise and compound Poisson (jump) noise. This framework formulates the stochastic optimal control problem in a path-space information-theoretic setting, associating the expected control cost with the free energy and Kullback–Leibler divergence. The controller seeks to minimize the composite cost

$λ\mathcal{F}(S(X)) = \inf_Q \left\{ \mathbb{E}_Q[S(X)] + λ \mathrm{D_{KL}}(Q\Vert P) \right\},$

where $S(X)$ is the cumulative trajectory cost, $λ$ is an inverse temperature parameter, $Q$ is the control-induced measure, and $P$ is the passive process.

The system dynamics are modeled as

$dx_t = f(x_t, t)dt + G(x_t, t)u_t dt + B(x_t, t) dw^{(1)}_t + H(x_t, t) dP^{(1)}_t,$

with $dP^{(1)}$ representing compound Poisson noise, characterized by a jump rate $ν$ and mark distribution $N(0, \Sigma_J)$ .

Control updates are computed iteratively using importance sampling over trajectory rollouts that explicitly include the jump statistics. The update for each control component is of the form

$u^*_j = u_j + [G(x_{t_j}, t_j)]^{-1} B(x_{t_j}, t_j) \frac{\mathbb{E}[\exp(-\frac{1}{λ}\tilde{S}(X))\epsilon_j]}{\mathbb{E}[\exp(-\frac{1}{λ}\tilde{S}(X))]}/\Delta t,$

with $\tilde{S}(X)$ including importance weighting terms.

This approach is computationally intensive but highly parallelizable: GPU implementation enables thousands of trajectory samples evaluated per iteration, achieving real-time control rates ( $<$ 20ms per iteration). Empirical results (cartpole, quadrotor) show pronounced robustness against increases in jump amplitude ( $\Sigma_J$ ) or jump rate ( $ν$ ), with the variance in critical state variables reduced and task success rates sustained even under large stochastic disturbances.

2. Time-Coarsening via Diffusing-Horizon MPC

The “diffusing-horizon” MPC methodology (Shin et al., 2020) targets multiscale systems where computational tractability is challenged by the length and granularity of the control horizon. Exploiting the exponential decay of sensitivity property (EDS), this approach justifies exponential time coarsening: the time grid density decays exponentially into the future, preserving detailed modeling near the present (where sensitivity is highest) and aggregating future stages.

Formally, the optimal solution mapping exhibits sensitivity bounds of the form

$\|z_i^B(d) - z_i^B(d^\prime)\| \leq \sum_{j=1}^{N} \Gamma_B \rho_B^{(|i-j|-1)_+} \|d_j - d^\prime_j\|,$

for basis $B$ , with decay parameter $\rho_B$ .

This design enables projection operators ( $T_k, U_k$ ) to aggregate variables/primitives across blocks, yielding a coarsened problem with exponentially fewer degrees of freedom. Computational testing on large-scale HVAC and energy systems demonstrates two orders of magnitude reduction in solution time (hours $\to$ minutes), with only a 3% increase in closed-loop cost compared to full-resolution MPC.

The approach is extensible to systems with spatially distributed topologies and scenario aggregation in multistage stochastic MPC, promising increased efficiency for real-time and embedded applications.

3. Generative Diffusion Models for Trajectory and Action Planning in MPC

Recent developments utilize neural diffusion models as generative priors for action/dynamics sequences in the MPC loop (Zhou et al., 7 Oct 2024, Huang et al., 5 Oct 2025). Here, the multi-step action proposal and dynamics model are both instantiated as conditional diffusion models, trained offline on trajectory datasets via standard denoising score matching:

$\mathcal{L} = \mathbb{E}_{k, x_0, y}[\|\epsilon_k - \epsilon_\theta(x_k, k, y)\|^2],$

with $x_k$ the noisy trajectory/action chunk, $y$ the conditioning variable.

At execution, candidate trajectories are sampled (“sample-score-rank” algorithm) from the learned proposal, simulated forward using the multi-step diffusion dynamics model, and ranked according to the planning objective:

$V_n = \kappa J(\text{original}) + \tau \tilde{J}(\text{novel}),$

enabling adaptation to test-time objectives and novel dynamics. The models afford trajectory-level coverage: compounding errors from one-step predictions are mitigated, and global goals (e.g., safety constraints; final state validity) are encoded through the joint modeling.

D-MPC is shown on benchmarks (D4RL, Adroit, Kitchen, real-world quadruped locomotion) to outperform prior single-step or autoregressive methods—MBOP, behavior cloning—and match state-of-the-art RL (CQL, IQL) on normalized scores, with additional flexibility for fast adaptation and novel reward optimization at runtime.

4. Constraint Handling and Dynamical Feasibility

Diffusion-based predictive controllers have further been extended to enforce explicit state/action constraints via model-based projection at each denoising iteration (Römer et al., 12 Dec 2024, Gadginmath et al., 31 Mar 2025, Huang et al., 5 Oct 2025). After each reverse step from the diffusion model, the trajectory is projected onto the set:

$\mathcal{Z}_f = \{ (s_{t:t+H}, a_{t:t+H}) \mid s_{t^\prime} \in \mathcal{S}_{t^\prime}, a_{t^\prime} \in \mathcal{A}_{t^\prime}, s_{t^\prime+1} = f(s_{t^\prime}, a_{t^\prime}) \},$

where projections ensure compliance with dynamics and constraints. In the presence of model imperfections or disturbance, state constraints are tightened via Minkowski set difference:

$\tilde{\mathcal{S}}_{t+1} = \mathcal{S}_{t+1} \ominus \mathcal{B}_\gamma,$

where $\mathcal{B}_\gamma$ is an $\ell_2$ ball of radius $\gamma$ representing disturbance bounds.

This iterative projection mechanism is shown to enable nearly 100% satisfaction of task and constraint requirements, including dynamic feasibility and obstacle avoidance in nonconvex spaces (robot manipulator, mobile navigation).

For dynamical admissibility, sequential prediction–projection is used so that at denoising step $i$ :

$\tau^{\prime}_{i-1} = [\sqrt{1-\beta_{i-1}}\mathcal{F}\mathcal{F}^\dagger + \sqrt{\beta_{i-1}} I ] \hat{\tau}_{i-1},$

with $\mathcal{F}$ the system mapping and $\mathcal{F}^\dagger$ its pseudo-inverse (data-driven Hankel matrices when dynamics are unknown). This process “steers” generated samples onto the manifold of feasible trajectories.

5. Multi-Modality and Fast Approximate Control

In nonconvex control problems, MPC solutions are often multi-modal due to local optima, redundancy, or nonconvex constraints (Huang et al., 11 Dec 2024, Julbe et al., 6 Apr 2025). Diffusion-based approximate MPC (AMPC) operations are able to represent this multimodal action distribution. The forward and reverse processes are constructed so that the learned model matches the empirical conditional density found by the solver; at runtime, multiple samples are scored by the cost/constraint criteria, and the mode selection is guided by gradient biases and early-stopped noise injection to provide consistent closed-loop control at high frequency (1kHz in simulation, 250Hz on hardware for 7-DoF manipulators). This yields a speedup over conventional (QPO-based) MPC of $>70\times$ and even outperforms the original numerical solver in success ratio for certain tasks.

6. Unification with MPPI, RL, and Sampling-Based Control

A generalization is provided by the Gibbs measure unification of MPPI, RL, and diffusion models (Li et al., 27 Feb 2025). The control policy is viewed as sampling from:

$p(U) = (1/Z) \exp(E(U)/\tau),$

where $E(U)$ arises as $-J(U)$ (cost), expected policy reward, or the numerator of data density. MPPI, policy gradient, and reverse diffusion all share an update of the form:

$U' = U + \Sigma \nabla \log q(U)$

with $q(U) \propto \exp(\hat{E}(U)/\tau)$ . This theoretical bridge supports hybrid planning algorithms mixing data-driven priors and optimization-guided refinement for robust control and motion planning.

7. Applications, Scalability, and Computational Considerations

D-MPC is applied to robotics (manipulation, legged locomotion, autonomous driving), industrial process control, energy arbitrage (Zarifis et al., 19 Mar 2025), power grid dispatch (Xu et al., 13 May 2025), and constrained navigation. GPU-accelerated sampling enables practical deployment on high-dimensional systems; random shooting techniques over diffusion-sampled candidate sequences offer near-globally optimal NMPC performance (Huang et al., 11 Dec 2024). Modular separation of proposal and dynamics models in D-MPC enhances adaptability—fine-tuning dynamics for altered systems while leaving action proposals intact—as demonstrated for hardware and simulation settings afflicted by modeling errors or physical defects.

Summary Table: Major D-MPC Approaches

Framework / Paper	Diffusion Role	Key Properties
Information-theoretic MPC (Wang et al., 2018)	Stochastic path-space (jump noise)	Robustness; importance sampling; GPU real-time
Diffusing-horizon MPC (Shin et al., 2020)	Exponential time coarsening	Scalability; suboptimality $\leq$ 3%
Generative D-MPC (Zhou et al., 7 Oct 2024, Huang et al., 5 Oct 2025)	Probabilistic trajectory modeling	Multimodality; SSR planning; adaptation
Constraint/projected D-MPC (Römer et al., 12 Dec 2024, Gadginmath et al., 31 Mar 2025)	Denoising w/ projection	Feasibility under constraints; mode selection
AMPC imitation (Julbe et al., 6 Apr 2025)	Imitation of solution distribution	Consistency; speedup; joint space control
Near-global NMPC (Huang et al., 11 Dec 2024)	Sampling optima - random shooting	Efficiency; globality; robustness

A plausible implication is that diffusion-based MPC will continue to merge theoretical stochastic control, efficient sampling/parallelization, neural generative modeling, and constraint enforcement in both simulation and hardware domains, addressing fundamental challenges in scalability, adaptation, and robust real-time control for complex dynamical systems.