Papers
Topics
Authors
Recent
2000 character limit reached

Info-Theoretic MPC with Stochastic Diffusion

Updated 20 November 2025
  • The paper introduces an MPC framework that extends path-integral control by integrating both Gaussian diffusion and jump noise for robust real-time control.
  • It employs information-theoretic cost functionals and GPU-parallelizable importance sampling to efficiently update controls under stochastic disturbances.
  • Empirical benchmarks on tasks like cart-pole and quadrotor tracking demonstrate enhanced performance and robustness compared to diffusion-only approaches.

Information-theoretic model predictive control (MPC) with stochastic diffusion extends the path-integral (PI) approach to optimal control by integrating both Gaussian (diffusion) and non-Gaussian (jump, e.g., compound Poisson) noise within a receding-horizon framework. This methodology incorporates information-theoretic cost functionals, importance sampling, and GPU-parallelizable iterative updates to enable real-time stochastic optimal control for nonlinear systems subject to general stochastic disturbances, including rare but significant jump events (Wang et al., 2018). Foundational work on path-integral control for diffusion processes provides the basis for the approach, while recent developments generalize MPC design to handle jump-diffusion systems (Arslan et al., 2014).

1. Stochastic System Dynamics with Jump-Diffusion

The controlled state dynamics are formalized as a continuous-time stochastic differential equation (SDE) on XtRnX_t \in \mathbb{R}^n:

dXt=f(Xt,t)dt+G(Xt,t)utdt+B(Xt,t)dWt+H(Xt,t)dNt\mathrm{d}X_t = f(X_t, t) \,\mathrm{d}t + G(X_t, t)\,u_t\,\mathrm{d}t + B(X_t, t)\,\mathrm{d}W_t + H(X_t, t)\,\mathrm{d}N_t

where:

  • utRmu_t\in\mathbb{R}^m is the control input,
  • WtW_t is a standard Brownian motion accounting for Gaussian noise,
  • NtN_t is a scalar compound Poisson process representing jumps, with rate parameter ν\nu and i.i.d. zero-mean Gaussian marks QkN(0,ΣJ)Q_k \sim \mathcal{N}(0, \Sigma_J),
  • BB and HH map diffusion and jump noises into state space,
  • ff determines deterministic drift, and GG the control channel.

This model captures both continuous perturbations and discontinuous events (jumps), generalizing the standard SDE setting (Wang et al., 2018).

2. Information-Theoretic Cost Functional

The control problem minimizes the expected finite-horizon cost:

J(u)=EQ[  ϕ(XT)+t0T[q(Xt,t)+12utRut]dt]J(u) = \mathbb{E}^{\mathbb{Q}}\Bigl[\;\phi(X_T) + \int_{t_0}^T \left[q(X_t, t) + \frac12\,u_t^\top R\,u_t\right]\mathrm{d}t\Bigr]

Q\mathbb{Q} denotes the path measure under the controlled SDE; ϕ\phi and qq are terminal and running cost functions, respectively; RR is the control penalty.

Free Energy and KL Bound

Defining the uncontrolled (prior) measure P\mathbb{P} by omitting the control term in the dynamics, the free energy is

F=λlogEP[exp(1λS(X))]\mathcal{F} = -\lambda\,\log \mathbb{E}_{\mathbb{P}}\left[\exp\left(-\frac{1}{\lambda} S(X)\right)\right]

with S(X)=ϕ(XT)+t0Tq(Xt,t)dtS(X) = \phi(X_T)+\int_{t_0}^T q(X_t,t)\mathrm{d}t.

Jensen's inequality yields

λFEQ[S(X)]+λDKL(QP)\lambda\,\mathcal{F} \leq \mathbb{E}_{\mathbb{Q}}[S(X)] + \lambda\,D_{KL}(\mathbb{Q}\|\mathbb{P})

By proper choice of RR (via Girsanov's theorem), the stochastic optimal control objective becomes equivalent to minimizing this upper bound, which balances expected cost and relative entropy between controlled and uncontrolled trajectory distributions (Wang et al., 2018, Arslan et al., 2014).

3. Path-Integral Formulation and Importance Sampling

The optimal path distribution Q\mathbb{Q}^* that minimizes cost and relative entropy is

dQdP=exp(1λS(X))EP[exp(1λS(X))]\frac{\mathrm{d}\mathbb{Q}^*}{\mathrm{d}\mathbb{P}} = \frac{\exp\left(-\frac{1}{\lambda} S(X)\right)}{\mathbb{E}_{\mathbb{P}}\left[\exp\left(-\frac{1}{\lambda} S(X)\right)\right]}

Control is obtained by projecting Q\mathbb{Q}^* onto admissible (parameterized) controls, minimizing DKL(QQ)D_{KL}(\mathbb{Q}^* \| \mathbb{Q}). After time discretization, the critical control update is:

ujnew=uj+G(xtj)1B(xtj)m=1Mwmϵjm/Δtm=1Mwmu_j^{\text{new}} = u_j + G(x_{t_j})^{-1}B(x_{t_j}) \frac{\sum_{m=1}^M w_m \epsilon_j^m/\sqrt{\Delta t}}{\sum_{m=1}^M w_m}

with importance weights wm=exp(1λS~(X(m)))w_m = \exp\left(-\frac{1}{\lambda} \tilde S(X^{(m)})\right), where X(m)X^{(m)} are sampled trajectories under the current control policy with stochastic perturbations ϵjm\epsilon_j^m reflecting both diffusion and jump events (Wang et al., 2018). The same form appears in the pure diffusion setting, where optimal control is computed as a weighted sum over sampled noise increments (Arslan et al., 2014).

4. Iterative Model Predictive Control Algorithm

Information-theoretic MPC with jump-diffusion is implemented as a receding horizon loop:

  1. Initialization: Set control sequence {u0,,uN1}\{u_0,\dots,u_{N-1}\}.
  2. Forward Simulation: For each of MM parallel rollouts:
    • Set the initial state to the current state.
    • For j=0j=0 to N1N-1:
      • Sample Gaussian noise ϵjm\epsilon_j^m.
      • With probability νΔt\nu \Delta t, sample jump noise δjm\delta_j^m and add to ϵjm\epsilon_j^m.
      • Propagate dynamics using the current control and both noise types.
      • Accumulate running cost.
    • Add terminal cost at horizon.
  3. Weighting and Update:
    • Compute exponential weights wmw_m for each trajectory based on total cost.
    • Update each uju_j using the weighted average of noise perturbations.
  4. Apply and Shift:
    • Apply u0u_0 to the real system for one interval.
    • Shift control sequence forward, re-initialize last element.
  5. Repeat at the next time step.

This parallel sampling structure makes the algorithm highly amenable to GPU implementation, supporting high-frequency receding-horizon replanning (Wang et al., 2018).

5. GPU Parallelization

In the proposed schema, each rollout trajectory is simulated independently and can be assigned to a separate GPU thread or warp. All core steps—noise sampling, propagation through SDE (including both diffusion and jump noise), cost accumulation, and exponential weight computation—are completely thread-local. Reduction operations are then used to aggregate the necessary statistics for control updates.

Typical configurations (e.g., M=3,000M=3{,}000, N=20N=20) require 10–20 ms for planning, enabling 50 Hz control rates. This efficiency underpins real-time MPC for nonlinear systems subject to compound stochasticity (Wang et al., 2018).

6. Empirical Performance and Benchmark Tasks

Simulation studies evaluate information-theoretic MPC with jump-diffusion on two canonical nonlinear control tasks:

  • Cart-Pole Swing-Up and Balance: Standard 4-state system, with diffusion (ΣD=0.1I\Sigma_D = 0.1\,I) and varying jump covariance (ΣJ{1,1.5,2,3}\Sigma_J \in \{1,1.5,2,3\}), and jump rates (ν{0.1,0.25,0.5}\nu \in \{0.1,0.25,0.5\}). New MPC with jump modeling achieves 96–100% success for moderate jumps over 100 trials; Gaussian-only MPC drops to 61–81% as jumps intensify.
  • 3D Quadrotor Waypoint Tracking: 12-state quadrotor model, full attitude kinematics, diffusion (ΣD=0.05I\Sigma_D = 0.05\,I) and heavier jump noise (ΣJ{5,10,20,30}\Sigma_J \in \{5,10,20,30\}, ν=0.2\nu=0.2). New MPC maintains 100% success at highest jump intensities, while the diffusion-only baseline fails to do so, dropping to 87%. Increasing rollout count (MM) reduces trajectory variance, but only explicit modeling of jump statistics yields robustness to large disturbances.

In both domains, explicit incorporation of jump events in importance sampling yields superior performance, especially as jump magnitude or rate increases. When jumps are negligible, the method matches diffusion-only approaches (Wang et al., 2018).

7. Broader Context and Methodological Extensions

The information-theoretic MPC framework for jump-diffusions generalizes earlier PI-based optimal control developed for pure diffusion SDEs (Arslan et al., 2014). In those earlier settings, the methodology leverages the HJB equation, Cole–Hopf log transformation, and a path-integral Feynman–Kac representation to express control optimality conditions in expectation form over unforced dynamics. Efficient implementation is closely tied to importance sampling, the free energy/relative entropy duality, and numerical strategies such as rapid-exploring random trees (RRT) to bias trajectory proposals toward promising regions.

The current jump-diffusion extension preserves the core importance sampling and sampling-based update structure while accounting for discontinuous state transitions and their statistics. This yields a practical algorithm for stochastic receding-horizon control encompassing a much broader class of disturbance models (Wang et al., 2018, Arslan et al., 2014).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Information-Theoretic MPC with Stochastic Diffusion.