FlowMPPI: Deep Learning MPC
- FlowMPPI is a sampling-based MPC that leverages conditional normalizing flows to generate goal-directed control sequences.
- It integrates deep probabilistic methods, such as variational inference and VAE-based encoding, for context-aware sampling.
- The approach enhances sample efficiency and robust generalization in out-of-distribution and complex environments.
FlowMPPI is a family of sampling-based Model Predictive Control (MPC) algorithms that leverages conditional normalizing flows to learn expressive, context-aware distributions over optimal control sequences. These methods fuse classical MPPI with deep probabilistic inference, particularly variational inference, to produce control samples that are both goal-directed and sensitive to environmental constraints, including out-of-distribution (OOD) scenarios. The approach addresses sample efficiency, generalization, and the limitations of conventional Gaussian sampling in high-dimensional, cluttered, or previously unseen environments (Power et al., 2022, Sacks et al., 2022).
1. Control-Sequence Distribution via Conditional Normalizing Flows
FlowMPPI replaces traditional factorized Gaussian samplers in MPPI with a conditional normalizing flow parameterization. The control sequence for horizon is drawn from a distribution , constructed as follows (Power et al., 2022):
- A variational autoencoder (VAE) encoder maps the environment's signed distance function (SDF) into a low-dimensional latent vector .
- A context MLP produces a context vector .
- A conditional normalizing flow transforms a base Gaussian into , conditioned on :
The density is given by the change-of-variables formula:
Context encodes the current state , goal , and latent environment . This amortized, conditional sampling mechanism enables the distribution to adapt both to robot dynamics and specific environmental geometries.
2. Objective: Variational Inference Perspective
The finite-horizon stochastic optimal control problem is reframed as inference on an optimality variable with
where is the trajectory cost. The variational free-energy (up to constants) minimized by FlowMPPI is
with absorbing log-prior costs. The training objective combines maximizing likelihood of low-cost samples with regularization via entropy and flow terms:
In practice, this objective is optimized using weighted maximum likelihood on sampled trajectories.
3. MPPI Algorithm Augmented with Learned Flow
At each MPC step, FlowMPPI interleaves classic MPPI perturbations with environment-aware flow samples. The algorithmic procedure is:
- Shift previous nominal control forward by one step.
- Compute nominal latent .
- Draw samples via classic Gaussian noise and via the flow.
- For each, rollout trajectories, compute cost-augmented scores (including auxiliary terms penalizing deviation from nominal in control or latent space).
- Compute softmax weights and form the weighted update for .
- Apply the first action to the plant.
The step integrates context-conditioned latent space sampling and explicit regularization paralleling KL-divergence in classic MPPI. The batch includes both noise-based perturbations and flow-induced proposals (Power et al., 2022, Sacks et al., 2022).
4. Out-of-Distribution Projection for Robust Generalization
FlowMPPI incorporates an OOD-projection mechanism for test-time robustness when the environment is not covered by the training data. This projects the environment latent to a nearby in-distribution point by minimizing
where
- (the flow prior density from the VAE, high if is unlikely),
- is the weighted negative log-likelihood for flow-sampled control sequences,
- modulates the in-distribution tradeoff.
This optimization adapts the flow's conditioning context for OOD environments, enhancing success rates and costs without retraining.
5. Architectural Details and Training Protocol
FlowMPPI and its variants employ the following architectural elements (Power et al., 2022):
- VAE over environment SDF ( or ): 4-stride-2 convolutions FC ; decoder reverses structure. Latent prior on uses a 4-layer Real-NVP flow.
- Context MLP : concatenates , 256-ReLU hidden, outputs .
- Control-flow : Real-NVP with 10 conditional coupling layers, batch-norm, and mixing.
- Planar system: , , ; quadrotor: , , .
- Hyperparameters: , sampling budgets , Adam learning rate , temperature, and tuned per system. VAE loss is weighted and frozen after 100 epochs; control-perturbation variance is annealed.
For FlowMPPIProj (with projection), half the step's sample budget is allocated to computation for projection. All components are compatible with standard deep learning frameworks.
6. Empirical Evaluation and Performance
FlowMPPI is benchmarked on navigation and control tasks in both in-distribution and OOD scenarios:
- Planar 2D double-integrator navigation with disc obstacles (in-distribution) and narrow four-room environments (OOD).
- 12DoF quadrotor in 3D spheres (in-distribution), four-room corridor (OOD), and two real-world reconstructions.
The following table summarizes selected results (Power et al., 2022):
| Method | Succ (2D OOD, K=512) | Cost (2D OOD, K=512) | Succ (3D OOD, K=512) | Cost (3D OOD, K=512) |
|---|---|---|---|---|
| MPPI | 0.29 | 2948 | 0.11 | 4724 |
| iCEM | 0.59 | 2145 | 0.47 | 4157 |
| FlowMPPI | 0.75 | 2155 | 0.72 | 3601 |
| FlowMPPIProj | 0.77 | 2155 | 0.83 | 3443 |
A similar trend is seen in real-world sim2real transfer, where FlowMPPIProj achieves substantial improvements in both success rate and cost. For example, in the stairway environment for the quadrotor:
- MPPI: succ 0.32, cost 3019
- iCEM: succ 0.58, cost 2623
- FlowMPPI: succ 0.50, cost 2463
- FlowMPPIProj: succ 0.85, cost 1745
Notably, baseline methods frequently get stuck in local minima in complex environments, where flow-based sampling yields goal-oriented, collision-free solutions from the first iteration.
7. Relationship to and Advances over Previous Methods
Earlier sampling-based MPC approaches typically employ Gaussian or control-space sampling, leading to suboptimal exploration and reliance on heuristics for parameter updates. FlowMPPI, especially when combined with bi-level optimization and end-to-end backpropagation-through-time (BPTT) training in a latent space (Sacks et al., 2022), achieves:
- An order-of-magnitude improvement in sample efficiency,
- 10–20% lower median cost across domains,
- Consistent or improved success rates,
- Robust generalization to out-of-distribution and real-world environments.
The integration of normalizing flows enables expressive, tractable distributions that account for environment geometry and dynamics. The OOD-projection step further enables reliable sim2real transfer, as demonstrated on real-world datasets without requiring retraining.
FlowMPPI has demonstrated a marked advance in the application of probabilistic learning and inference to MPC, especially in scenarios where classical methods are challenged by non-Gaussianity, environmental complexity, or dataset shift (Power et al., 2022, Sacks et al., 2022).