Info-Theoretic MPC with Stochastic Diffusion
- The paper introduces an MPC framework that extends path-integral control by integrating both Gaussian diffusion and jump noise for robust real-time control.
- It employs information-theoretic cost functionals and GPU-parallelizable importance sampling to efficiently update controls under stochastic disturbances.
- Empirical benchmarks on tasks like cart-pole and quadrotor tracking demonstrate enhanced performance and robustness compared to diffusion-only approaches.
Information-theoretic model predictive control (MPC) with stochastic diffusion extends the path-integral (PI) approach to optimal control by integrating both Gaussian (diffusion) and non-Gaussian (jump, e.g., compound Poisson) noise within a receding-horizon framework. This methodology incorporates information-theoretic cost functionals, importance sampling, and GPU-parallelizable iterative updates to enable real-time stochastic optimal control for nonlinear systems subject to general stochastic disturbances, including rare but significant jump events (Wang et al., 2018). Foundational work on path-integral control for diffusion processes provides the basis for the approach, while recent developments generalize MPC design to handle jump-diffusion systems (Arslan et al., 2014).
1. Stochastic System Dynamics with Jump-Diffusion
The controlled state dynamics are formalized as a continuous-time stochastic differential equation (SDE) on :
where:
- is the control input,
- is a standard Brownian motion accounting for Gaussian noise,
- is a scalar compound Poisson process representing jumps, with rate parameter and i.i.d. zero-mean Gaussian marks ,
- and map diffusion and jump noises into state space,
- determines deterministic drift, and the control channel.
This model captures both continuous perturbations and discontinuous events (jumps), generalizing the standard SDE setting (Wang et al., 2018).
2. Information-Theoretic Cost Functional
The control problem minimizes the expected finite-horizon cost:
denotes the path measure under the controlled SDE; and are terminal and running cost functions, respectively; is the control penalty.
Free Energy and KL Bound
Defining the uncontrolled (prior) measure by omitting the control term in the dynamics, the free energy is
with .
Jensen's inequality yields
By proper choice of (via Girsanov's theorem), the stochastic optimal control objective becomes equivalent to minimizing this upper bound, which balances expected cost and relative entropy between controlled and uncontrolled trajectory distributions (Wang et al., 2018, Arslan et al., 2014).
3. Path-Integral Formulation and Importance Sampling
The optimal path distribution that minimizes cost and relative entropy is
Control is obtained by projecting onto admissible (parameterized) controls, minimizing . After time discretization, the critical control update is:
with importance weights , where are sampled trajectories under the current control policy with stochastic perturbations reflecting both diffusion and jump events (Wang et al., 2018). The same form appears in the pure diffusion setting, where optimal control is computed as a weighted sum over sampled noise increments (Arslan et al., 2014).
4. Iterative Model Predictive Control Algorithm
Information-theoretic MPC with jump-diffusion is implemented as a receding horizon loop:
- Initialization: Set control sequence .
- Forward Simulation: For each of parallel rollouts:
- Set the initial state to the current state.
- For to :
- Sample Gaussian noise .
- With probability , sample jump noise and add to .
- Propagate dynamics using the current control and both noise types.
- Accumulate running cost.
- Add terminal cost at horizon.
- Weighting and Update:
- Compute exponential weights for each trajectory based on total cost.
- Update each using the weighted average of noise perturbations.
- Apply and Shift:
- Apply to the real system for one interval.
- Shift control sequence forward, re-initialize last element.
- Repeat at the next time step.
This parallel sampling structure makes the algorithm highly amenable to GPU implementation, supporting high-frequency receding-horizon replanning (Wang et al., 2018).
5. GPU Parallelization
In the proposed schema, each rollout trajectory is simulated independently and can be assigned to a separate GPU thread or warp. All core steps—noise sampling, propagation through SDE (including both diffusion and jump noise), cost accumulation, and exponential weight computation—are completely thread-local. Reduction operations are then used to aggregate the necessary statistics for control updates.
Typical configurations (e.g., , ) require 10–20 ms for planning, enabling 50 Hz control rates. This efficiency underpins real-time MPC for nonlinear systems subject to compound stochasticity (Wang et al., 2018).
6. Empirical Performance and Benchmark Tasks
Simulation studies evaluate information-theoretic MPC with jump-diffusion on two canonical nonlinear control tasks:
- Cart-Pole Swing-Up and Balance: Standard 4-state system, with diffusion () and varying jump covariance (), and jump rates (). New MPC with jump modeling achieves 96–100% success for moderate jumps over 100 trials; Gaussian-only MPC drops to 61–81% as jumps intensify.
- 3D Quadrotor Waypoint Tracking: 12-state quadrotor model, full attitude kinematics, diffusion () and heavier jump noise (, ). New MPC maintains 100% success at highest jump intensities, while the diffusion-only baseline fails to do so, dropping to 87%. Increasing rollout count () reduces trajectory variance, but only explicit modeling of jump statistics yields robustness to large disturbances.
In both domains, explicit incorporation of jump events in importance sampling yields superior performance, especially as jump magnitude or rate increases. When jumps are negligible, the method matches diffusion-only approaches (Wang et al., 2018).
7. Broader Context and Methodological Extensions
The information-theoretic MPC framework for jump-diffusions generalizes earlier PI-based optimal control developed for pure diffusion SDEs (Arslan et al., 2014). In those earlier settings, the methodology leverages the HJB equation, Cole–Hopf log transformation, and a path-integral Feynman–Kac representation to express control optimality conditions in expectation form over unforced dynamics. Efficient implementation is closely tied to importance sampling, the free energy/relative entropy duality, and numerical strategies such as rapid-exploring random trees (RRT) to bias trajectory proposals toward promising regions.
The current jump-diffusion extension preserves the core importance sampling and sampling-based update structure while accounting for discontinuous state transitions and their statistics. This yields a practical algorithm for stochastic receding-horizon control encompassing a much broader class of disturbance models (Wang et al., 2018, Arslan et al., 2014).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free