FlowMPPI: Deep Learning MPC

Updated 3 March 2026

FlowMPPI is a sampling-based MPC that leverages conditional normalizing flows to generate goal-directed control sequences.
It integrates deep probabilistic methods, such as variational inference and VAE-based encoding, for context-aware sampling.
The approach enhances sample efficiency and robust generalization in out-of-distribution and complex environments.

FlowMPPI is a family of sampling-based Model Predictive Control (MPC) algorithms that leverages conditional normalizing flows to learn expressive, context-aware distributions over optimal control sequences. These methods fuse classical MPPI with deep probabilistic inference, particularly variational inference, to produce control samples that are both goal-directed and sensitive to environmental constraints, including out-of-distribution (OOD) scenarios. The approach addresses sample efficiency, generalization, and the limitations of conventional Gaussian sampling in high-dimensional, cluttered, or previously unseen environments (Power et al., 2022, Sacks et al., 2022).

1. Control-Sequence Distribution via Conditional Normalizing Flows

FlowMPPI replaces traditional factorized Gaussian samplers in MPPI with a conditional normalizing flow parameterization. The control sequence $U=(u_0,\ldots,u_{H-1}) \in \mathbb{R}^{H\cdot d_u}$ for horizon $H$ is drawn from a distribution $q_\phi(U|x_0,x_G,E)$ , constructed as follows (Power et al., 2022):

A variational autoencoder (VAE) encoder $q_\theta(h|E)$ maps the environment's signed distance function (SDF) $E$ into a low-dimensional latent vector $h$ .
A context MLP $g_\omega(x_0,x_G,h)$ produces a context vector $C$ .
A conditional normalizing flow $f_\phi$ transforms a base Gaussian $Z\sim N(0,I)$ into $U$ , conditioned on $C$ :

$U = f_\phi(Z;C), \quad Z = f_\phi^{-1}(U;C).$

The density is given by the change-of-variables formula:

$q_\phi(U|C) = N(Z;0,I)\cdot \exp(-\log|\det\partial f_\phi/\partial Z|), \quad Z=f_\phi^{-1}(U;C).$

Context encodes the current state $x_0$ , goal $x_G$ , and latent environment $h$ . This amortized, conditional sampling mechanism enables the distribution to adapt both to robot dynamics and specific environmental geometries.

2. Objective: Variational Inference Perspective

The finite-horizon stochastic optimal control problem is reframed as inference on an optimality variable $o$ with

$p(o=1|\tau) \propto \exp(-J(\tau)), \quad \tau=(X,U),$

where $J(\tau)$ is the trajectory cost. The variational free-energy (up to constants) minimized by FlowMPPI is

$\mathcal{F}[\phi] = \mathrm{KL}(q_\phi(U)\,\|\,p(U|o=1)) = -\mathbb{E}_{q(\tau)}[\log p(o|\tau)] - H(q_\phi(U)),$

with $J(\tau)$ absorbing log-prior costs. The training objective combines maximizing likelihood of low-cost samples with regularization via entropy and flow terms:

$\mathcal{F} = -\mathbb{E}_{q(h)q_\phi(U|C)}[-J(\tau)] + \mathbb{E}_{p(Z)}[\log p(Z) - \log|\det\partial f_\phi/\partial Z|].$

In practice, this objective is optimized using weighted maximum likelihood on sampled trajectories.

3. MPPI Algorithm Augmented with Learned Flow

At each MPC step, FlowMPPI interleaves classic MPPI perturbations with environment-aware flow samples. The algorithmic procedure is:

Shift previous nominal control $U^0$ forward by one step.
Compute nominal latent $Z^0=f_\phi^{-1}(U^0;C)$ .
Draw $K/2$ samples via classic Gaussian noise and $K/2$ via the flow.
For each, rollout trajectories, compute cost-augmented scores $S^k$ (including auxiliary terms penalizing deviation from nominal in control or latent space).
Compute softmax weights $w_k$ and form the weighted update for $U^0$ .
Apply the first action to the plant.

The step integrates context-conditioned latent space sampling and explicit regularization paralleling KL-divergence in classic MPPI. The batch includes both noise-based perturbations and flow-induced proposals (Power et al., 2022, Sacks et al., 2022).

4. Out-of-Distribution Projection for Robust Generalization

FlowMPPI incorporates an OOD-projection mechanism for test-time robustness when the environment $E$ is not covered by the training data. This projects the environment latent $h$ to a nearby in-distribution point by minimizing

$\hat{h} = \arg\min_h\ [\,b\cdot L_{OOD}(h) + L_{\mathrm{flow}}(h)\,],$

where

$L_{OOD}(h) = -\log p_{\phi_{\mathrm{prior}}}(h)$ (the flow prior density from the VAE, high if $h$ is unlikely),
$L_{\mathrm{flow}}(h)$ is the weighted negative log-likelihood for flow-sampled control sequences,
$b$ modulates the in-distribution tradeoff.

This optimization adapts the flow's conditioning context for OOD environments, enhancing success rates and costs without retraining.

5. Architectural Details and Training Protocol

FlowMPPI and its variants employ the following architectural elements (Power et al., 2022):

VAE over environment SDF ( $E\in \mathbb{R}^{64\times64}$ or $\mathbb{R}^{64\times64\times64}$ ): 4-stride-2 convolutions $\to$ FC $\to$ $(\mu_h,\sigma_h)$ ; decoder reverses structure. Latent prior on $h$ uses a 4-layer Real-NVP flow.
Context MLP $g_\omega$ : concatenates $(x_0,x_G,h)$ , 256-ReLU hidden, outputs $C$ .
Control-flow $f_\phi$ : Real-NVP with 10 conditional coupling layers, batch-norm, and mixing.
Planar system: $d_x=4$ , $d_u=2$ , $d_h=d_C=64$ ; quadrotor: $d_x=12$ , $d_u=4$ , $d_h=d_C=256$ .
Hyperparameters: $H=40$ , sampling budgets $K\in\{256,512,1024\}$ , Adam learning rate $1\mathrm{e}{-3}$ , $\lambda=1$ temperature, and $\Sigma_U$ tuned per system. VAE loss is weighted and frozen after 100 epochs; control-perturbation variance is annealed.

For FlowMPPIProj (with projection), half the step's sample budget is allocated to $L_{\mathrm{flow}}$ computation for projection. All components are compatible with standard deep learning frameworks.

6. Empirical Evaluation and Performance

FlowMPPI is benchmarked on navigation and control tasks in both in-distribution and OOD scenarios:

Planar 2D double-integrator navigation with disc obstacles (in-distribution) and narrow four-room environments (OOD).
12DoF quadrotor in 3D spheres (in-distribution), four-room corridor (OOD), and two real-world reconstructions.

The following table summarizes selected results (Power et al., 2022):

Method	Succ (2D OOD, K=512)	Cost (2D OOD, K=512)	Succ (3D OOD, K=512)	Cost (3D OOD, K=512)
MPPI	0.29	2948	0.11	4724
iCEM	0.59	2145	0.47	4157
FlowMPPI	0.75	2155	0.72	3601
FlowMPPIProj	0.77	2155	0.83	3443

A similar trend is seen in real-world sim2real transfer, where FlowMPPIProj achieves substantial improvements in both success rate and cost. For example, in the stairway environment for the quadrotor:

MPPI: succ 0.32, cost 3019
iCEM: succ 0.58, cost 2623
FlowMPPI: succ 0.50, cost 2463
FlowMPPIProj: succ 0.85, cost 1745

Notably, baseline methods frequently get stuck in local minima in complex environments, where flow-based sampling yields goal-oriented, collision-free solutions from the first iteration.

7. Relationship to and Advances over Previous Methods

Earlier sampling-based MPC approaches typically employ Gaussian or control-space sampling, leading to suboptimal exploration and reliance on heuristics for parameter updates. FlowMPPI, especially when combined with bi-level optimization and end-to-end backpropagation-through-time (BPTT) training in a latent space (Sacks et al., 2022), achieves:

An order-of-magnitude improvement in sample efficiency,
10–20% lower median cost across domains,
Consistent or improved success rates,
Robust generalization to out-of-distribution and real-world environments.

The integration of normalizing flows enables expressive, tractable distributions that account for environment geometry and dynamics. The OOD-projection step further enables reliable sim2real transfer, as demonstrated on real-world datasets without requiring retraining.

FlowMPPI has demonstrated a marked advance in the application of probabilistic learning and inference to MPC, especially in scenarios where classical methods are challenged by non-Gaussianity, environmental complexity, or dataset shift (Power et al., 2022, Sacks et al., 2022).

Markdown Report Issue Upgrade to Chat

References (2)

Variational Inference MPC using Normalizing Flows and Out-of-Distribution Projection (2022)

Learning Sampling Distributions for Model Predictive Control (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FlowMPPI.