Papers
Topics
Authors
Recent
Search
2000 character limit reached

MP1 Algorithm: One-Step Generative Policy Learning

Updated 24 June 2026
  • MP1 Algorithm is a generative policy learning method for robotic manipulation that leverages the MeanFlow paradigm to compute one-step trajectories using 3D point cloud and state data.
  • It utilizes an interval-averaged velocity formulation solved via a U-Net model along with dispersive regularization to enable efficient few-shot generalization.
  • Empirical results demonstrate MP1's superior performance with lower inference latency and higher success rates compared to diffusion-based and traditional flow-based approaches.

MP1 is a generative policy learning algorithm for robotic manipulation that leverages the MeanFlow paradigm to deliver one-step (1-NFE) trajectory generation for high-dimensional, context-rich policy inference using 3D point cloud observations and robot state histories. It addresses the trade-off between the slow, autoregressive sampling of diffusion-based policies and the consistency constraints necessary for classical flow-based policies by introducing a new formulation—interval-averaged velocity learning—that is solved efficiently using a U-Net–based model and a dispersive regularizer for few-shot generalization (Sheng et al., 14 Jul 2025).

1. Theoretical Foundations: MeanFlow and the MeanFlow Identity

Traditional Flow Matching learns the instantaneous velocity field v(zt,t)v(z_t, t) under the continuity equation dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t), requiring the solution of an ODE during inference. MP1 instead models the interval-averaged velocity:

u(zt,r,t)≜1t−r∫rtv(zτ,τ) dτu(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau

Rather than estimating vv pointwise and integrating, MP1 exploits the MeanFlow Identity:

u(zt,r,t)=v(zt,t)−(t−r)ddtu(zt,r,t)u(z_t, r, t) = v(z_t, t) - (t - r)\frac{d}{dt}u(z_t, r, t)

where the total derivative is ddtu=v(zt,t) ∂zu+∂tu\frac{d}{dt}u = v(z_t, t)\,\partial_z u + \partial_t u. This relates the interval average uu to the local velocity, enabling direct, closed-form trajectory prediction without iterated integration or structural consistency losses. The approach eliminates numerical ODE-solver errors at inference.

2. Policy Architecture and One-Step Inference

The MP1 network receives:

  • A sequence of raw point-clouds P∈RnoĂ—npĂ—3P \in \mathbb{R}^{n_o \times n_p \times 3}
  • A sequence of proprioceptive robot states S∈RnoĂ—sdS \in \mathbb{R}^{n_o \times s_d}

Separate encoders extract visual (fvf_v) and state (dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)0) features, which are concatenated as the conditional code dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)1. The downstream network is a U-Net, which, for given noisy trajectory dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)2, interval endpoints dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)3, and condition dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)4, predicts dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)5.

At inference, with dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)6 and a single Gaussian sample dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)7,

dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)8

This constitutes "true" 1-NFE (one network function evaluation): a single forward pass suffices, and no numerical integration is required.

3. Training Objective: Classifier-Free Guidance and Dispersive Loss

MP1 uses two primary objective terms:

(a) CFG regression loss:

dztdt=v(zt,t)\frac{dz_t}{dt} = v(z_t, t)9

with u(zt,r,t)≜1t−r∫rtv(zτ,τ) dτu(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau0, and

u(zt,r,t)≜1t−r∫rtv(zτ,τ) dτu(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau1

Classifier-Free Guidance (CFG) is incorporated by randomly dropping the conditioning u(zt,r,t)≜1t−r∫rtv(zτ,τ) dτu(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau2 with some probability:

u(zt,r,t)≜1t−r∫rtv(zτ,τ) dτu(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau3

which is used in u(zt,r,t)≜1t−r∫rtv(zτ,τ) dτu(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau4 for regression.

(b) Dispersive Loss:

u(zt,r,t)≜1t−r∫rtv(zτ,τ) dτu(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau5

where u(zt,r,t)≜1t−r∫rtv(zτ,τ) dτu(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau6 are down-block latent representations for different batch samples. This repels embedding vectors in latent space, improving generalization in few-shot learning and discouraging latent collapse.

The total objective is:

u(zt,r,t)≜1t−r∫rtv(zτ,τ) dτu(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau7

4. Empirical Performance and Ablation Results

On Adroit and Meta-World benchmarks, MP1 outperforms both DP3 (diffusion-based, 10 NFE) and FlowPolicy (flow-based, 1 NFE) in both average success rate and latency:

Method NFE Avg. Success (\%) Avg. Inference Time (ms)
DP3 10 68.7 132.2
FlowPolicy 1 71.6 12.6
MP1 1 78.9 6.8

Ablation studies show:

  • Removing Dispersive Loss reduces average success by ~5%.
  • Performance is maximized at intermediate interval ratios u(zt,r,t)≜1t−r∫rtv(zÏ„,Ï„) dÏ„u(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau8 (not u(zt,r,t)≜1t−r∫rtv(zÏ„,Ï„) dÏ„u(z_t, r, t) \triangleq \frac{1}{t - r} \int_r^t v(z_\tau, \tau) \, d\tau9, i.e., not classical flow matching), confirming the benefit of interval-averaged flows.
  • MP1 maintains superior performance even in few-shot imitation regimes (2–5 demonstrations), with near-state-of-the-art results by 10 demonstrations, attributable to improved discriminative capacity in the latent space.

In real-world robotic manipulation tasks on the ARX R5 dual-arm, MP1 achieves the highest task success rates and fastest average completion times in comparison to both baselines.

5. Comparison with Alternative Generative Policy Methods

MP1 achieves:

  • 19x reduction in inference latency versus diffusion-based DP3, with improved success rates (vv0).
  • Nearly 2x faster inference and vv1 higher success than the explicit flow-based FlowPolicy.
  • True one-step policy generation, with no consistency loss or numerical ODE-solver artifacts.

This is enabled by its local learning strategy (via the MeanFlow Identity), one-step trajectory computation (1-NFE), and lightweight, batch-wide dispersive regularization.

6. Implementation and Evaluation Protocols

The reported implementation uses:

  • Batch size 128, AdamW vv2
  • Farthest-point sampling for 512 or 1024 points
  • Downsampled images to vv3
  • Training for 3000 epochs (Adroit) or 1000 (Meta-World), with evaluation every 200 epochs, and top 5 checkpoint selection per seed.

Evaluation comprises 10 expert demos per task in benchmarks, 20 human demos per task in real-world, and assessment on five real tasks. MP1 is deployed with a U-Net backbone and 3D-conditioned inputs on NVIDIA RTX4090 hardware.

7. Context, Limitations, and Prospective Directions

By integrating the MeanFlow identity into the policy learning process, MP1 bypasses the need for multi-step iterative sampling and structural consistency. This permits both high-frequency control and robust generalization when only a few demonstrations are available. The main empirical findings demonstrate not only improved task performance but a substantial drop in inference time, which is critical for real-world closed-loop robotic control.

A plausible implication is that interval-averaged flow architectures with dispersive regularization will become the design of choice for settings where latency, sample efficiency, and robust few-shot generalization are all priorities (Sheng et al., 14 Jul 2025). The need for only a single function evaluation at deployment may facilitate embedded and resource-constrained robotic systems. However, the long-term impact and generalization beyond the tested benchmarks will depend on continued evaluation across more heterogeneous tasks and sensor modalities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MP1 Algorithm.