Papers
Topics
Authors
Recent
Search
2000 character limit reached

FlowMPPI: Deep Learning MPC

Updated 3 March 2026
  • FlowMPPI is a sampling-based MPC that leverages conditional normalizing flows to generate goal-directed control sequences.
  • It integrates deep probabilistic methods, such as variational inference and VAE-based encoding, for context-aware sampling.
  • The approach enhances sample efficiency and robust generalization in out-of-distribution and complex environments.

FlowMPPI is a family of sampling-based Model Predictive Control (MPC) algorithms that leverages conditional normalizing flows to learn expressive, context-aware distributions over optimal control sequences. These methods fuse classical MPPI with deep probabilistic inference, particularly variational inference, to produce control samples that are both goal-directed and sensitive to environmental constraints, including out-of-distribution (OOD) scenarios. The approach addresses sample efficiency, generalization, and the limitations of conventional Gaussian sampling in high-dimensional, cluttered, or previously unseen environments (Power et al., 2022, Sacks et al., 2022).

1. Control-Sequence Distribution via Conditional Normalizing Flows

FlowMPPI replaces traditional factorized Gaussian samplers in MPPI with a conditional normalizing flow parameterization. The control sequence U=(u0,,uH1)RHduU=(u_0,\ldots,u_{H-1}) \in \mathbb{R}^{H\cdot d_u} for horizon HH is drawn from a distribution qϕ(Ux0,xG,E)q_\phi(U|x_0,x_G,E), constructed as follows (Power et al., 2022):

  • A variational autoencoder (VAE) encoder qθ(hE)q_\theta(h|E) maps the environment's signed distance function (SDF) EE into a low-dimensional latent vector hh.
  • A context MLP gω(x0,xG,h)g_\omega(x_0,x_G,h) produces a context vector CC.
  • A conditional normalizing flow fϕf_\phi transforms a base Gaussian ZN(0,I)Z\sim N(0,I) into UU, conditioned on CC:

U=fϕ(Z;C),Z=fϕ1(U;C).U = f_\phi(Z;C), \quad Z = f_\phi^{-1}(U;C).

The density is given by the change-of-variables formula:

qϕ(UC)=N(Z;0,I)exp(logdetfϕ/Z),Z=fϕ1(U;C).q_\phi(U|C) = N(Z;0,I)\cdot \exp(-\log|\det\partial f_\phi/\partial Z|), \quad Z=f_\phi^{-1}(U;C).

Context encodes the current state x0x_0, goal xGx_G, and latent environment hh. This amortized, conditional sampling mechanism enables the distribution to adapt both to robot dynamics and specific environmental geometries.

2. Objective: Variational Inference Perspective

The finite-horizon stochastic optimal control problem is reframed as inference on an optimality variable oo with

p(o=1τ)exp(J(τ)),τ=(X,U),p(o=1|\tau) \propto \exp(-J(\tau)), \quad \tau=(X,U),

where J(τ)J(\tau) is the trajectory cost. The variational free-energy (up to constants) minimized by FlowMPPI is

F[ϕ]=KL(qϕ(U)p(Uo=1))=Eq(τ)[logp(oτ)]H(qϕ(U)),\mathcal{F}[\phi] = \mathrm{KL}(q_\phi(U)\,\|\,p(U|o=1)) = -\mathbb{E}_{q(\tau)}[\log p(o|\tau)] - H(q_\phi(U)),

with J(τ)J(\tau) absorbing log-prior costs. The training objective combines maximizing likelihood of low-cost samples with regularization via entropy and flow terms:

F=Eq(h)qϕ(UC)[J(τ)]+Ep(Z)[logp(Z)logdetfϕ/Z].\mathcal{F} = -\mathbb{E}_{q(h)q_\phi(U|C)}[-J(\tau)] + \mathbb{E}_{p(Z)}[\log p(Z) - \log|\det\partial f_\phi/\partial Z|].

In practice, this objective is optimized using weighted maximum likelihood on sampled trajectories.

3. MPPI Algorithm Augmented with Learned Flow

At each MPC step, FlowMPPI interleaves classic MPPI perturbations with environment-aware flow samples. The algorithmic procedure is:

  1. Shift previous nominal control U0U^0 forward by one step.
  2. Compute nominal latent Z0=fϕ1(U0;C)Z^0=f_\phi^{-1}(U^0;C).
  3. Draw K/2K/2 samples via classic Gaussian noise and K/2K/2 via the flow.
  4. For each, rollout trajectories, compute cost-augmented scores SkS^k (including auxiliary terms penalizing deviation from nominal in control or latent space).
  5. Compute softmax weights wkw_k and form the weighted update for U0U^0.
  6. Apply the first action to the plant.

The step integrates context-conditioned latent space sampling and explicit regularization paralleling KL-divergence in classic MPPI. The batch includes both noise-based perturbations and flow-induced proposals (Power et al., 2022, Sacks et al., 2022).

4. Out-of-Distribution Projection for Robust Generalization

FlowMPPI incorporates an OOD-projection mechanism for test-time robustness when the environment EE is not covered by the training data. This projects the environment latent hh to a nearby in-distribution point by minimizing

h^=argminh [bLOOD(h)+Lflow(h)],\hat{h} = \arg\min_h\ [\,b\cdot L_{OOD}(h) + L_{\mathrm{flow}}(h)\,],

where

  • LOOD(h)=logpϕprior(h)L_{OOD}(h) = -\log p_{\phi_{\mathrm{prior}}}(h) (the flow prior density from the VAE, high if hh is unlikely),
  • Lflow(h)L_{\mathrm{flow}}(h) is the weighted negative log-likelihood for flow-sampled control sequences,
  • bb modulates the in-distribution tradeoff.

This optimization adapts the flow's conditioning context for OOD environments, enhancing success rates and costs without retraining.

5. Architectural Details and Training Protocol

FlowMPPI and its variants employ the following architectural elements (Power et al., 2022):

  • VAE over environment SDF (ER64×64E\in \mathbb{R}^{64\times64} or R64×64×64\mathbb{R}^{64\times64\times64}): 4-stride-2 convolutions \to FC \to (μh,σh)(\mu_h,\sigma_h); decoder reverses structure. Latent prior on hh uses a 4-layer Real-NVP flow.
  • Context MLP gωg_\omega: concatenates (x0,xG,h)(x_0,x_G,h), 256-ReLU hidden, outputs CC.
  • Control-flow fϕf_\phi: Real-NVP with 10 conditional coupling layers, batch-norm, and mixing.
  • Planar system: dx=4d_x=4, du=2d_u=2, dh=dC=64d_h=d_C=64; quadrotor: dx=12d_x=12, du=4d_u=4, dh=dC=256d_h=d_C=256.
  • Hyperparameters: H=40H=40, sampling budgets K{256,512,1024}K\in\{256,512,1024\}, Adam learning rate 1e31\mathrm{e}{-3}, λ=1\lambda=1 temperature, and ΣU\Sigma_U tuned per system. VAE loss is weighted and frozen after 100 epochs; control-perturbation variance is annealed.

For FlowMPPIProj (with projection), half the step's sample budget is allocated to LflowL_{\mathrm{flow}} computation for projection. All components are compatible with standard deep learning frameworks.

6. Empirical Evaluation and Performance

FlowMPPI is benchmarked on navigation and control tasks in both in-distribution and OOD scenarios:

  • Planar 2D double-integrator navigation with disc obstacles (in-distribution) and narrow four-room environments (OOD).
  • 12DoF quadrotor in 3D spheres (in-distribution), four-room corridor (OOD), and two real-world reconstructions.

The following table summarizes selected results (Power et al., 2022):

Method Succ (2D OOD, K=512) Cost (2D OOD, K=512) Succ (3D OOD, K=512) Cost (3D OOD, K=512)
MPPI 0.29 2948 0.11 4724
iCEM 0.59 2145 0.47 4157
FlowMPPI 0.75 2155 0.72 3601
FlowMPPIProj 0.77 2155 0.83 3443

A similar trend is seen in real-world sim2real transfer, where FlowMPPIProj achieves substantial improvements in both success rate and cost. For example, in the stairway environment for the quadrotor:

  • MPPI: succ 0.32, cost 3019
  • iCEM: succ 0.58, cost 2623
  • FlowMPPI: succ 0.50, cost 2463
  • FlowMPPIProj: succ 0.85, cost 1745

Notably, baseline methods frequently get stuck in local minima in complex environments, where flow-based sampling yields goal-oriented, collision-free solutions from the first iteration.

7. Relationship to and Advances over Previous Methods

Earlier sampling-based MPC approaches typically employ Gaussian or control-space sampling, leading to suboptimal exploration and reliance on heuristics for parameter updates. FlowMPPI, especially when combined with bi-level optimization and end-to-end backpropagation-through-time (BPTT) training in a latent space (Sacks et al., 2022), achieves:

  • An order-of-magnitude improvement in sample efficiency,
  • 10–20% lower median cost across domains,
  • Consistent or improved success rates,
  • Robust generalization to out-of-distribution and real-world environments.

The integration of normalizing flows enables expressive, tractable distributions that account for environment geometry and dynamics. The OOD-projection step further enables reliable sim2real transfer, as demonstrated on real-world datasets without requiring retraining.

FlowMPPI has demonstrated a marked advance in the application of probabilistic learning and inference to MPC, especially in scenarios where classical methods are challenged by non-Gaussianity, environmental complexity, or dataset shift (Power et al., 2022, Sacks et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FlowMPPI.