RMFlow: Refined Mean Flow Algorithm

Updated 7 February 2026

RMFlow is a generative modeling algorithm that augments deterministic one-step MeanFlow with a tailored noise-injection step to overcome mode collapse.
It combines coarse mean transport with stochastic refinement, enabling efficient and multimodal sample generation across varied tasks.
RMFlow is validated on tasks such as text-to-image and molecule generation, achieving competitive metrics with minimal neural function evaluations.

Refined Mean Flow (RMFlow), as introduced by Geng et al. in "RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation" (Huang et al., 31 Jan 2026), is a generative modeling algorithm that addresses the core limitations of single-step MeanFlow in multimodal data generation. RMFlow combines coarse, one-step (1-NFE) MeanFlow transport with a tailored stochastic noise-injection refinement, resulting in near state-of-the-art sample quality at efficiency and speed comparable to the deterministic MeanFlow baseline. RMFlow is evaluated across diverse conditional and unconditional tasks, including text-to-image, context-to-molecule, and time-series generation.

1. Conceptual Foundations

Traditional Flow Matching (FM) methods learn a continuous-time velocity field $u_t(x)$ to transport a simple prior distribution (typically $q(x) = \mathcal{N}(0, I)$ ) to a complex data distribution $p_{\text{data}}(x)$ via the ODE $d x_t / dt = u_t(x_t)$ . Although FM enables high-fidelity sample generation, it typically requires a large number of neural function evaluations (NFEs) for ODE integration, which incurs significant computational cost.

MeanFlow accelerates FM by directly parameterizing the mean velocity between two time points, $t$ and $r$ , as $u_{t,r}(x) = (x_r - x_t)/(r-t)$ , which, in effect, bypasses the need for finely spaced integration steps. In n-NFE MeanFlow, samples are deterministically updated as $x_{\tau_{i+1}} = x_{\tau_i} + (\tau_{i+1} - \tau_i)\,\hat{u}_{\tau_i, \tau_{i+1}}(x_{\tau_i}; \theta)$ . However, collapsing to 1-NFE (single shot transport) leads to severe mode collapse and sample degradation: for instance, in mixture of Gaussians and molecular graph generation on QM9, 1-NFE MeanFlow fails to represent multimodal structure and validity.

RMFlow augments this deterministic transport with one carefully parameterized noise-injection step after coarse flow propagation, enabling effective multimodal sample coverage and maximizing sample likelihood.

2. Algorithmic Structure

The RMFlow sampling algorithm is as follows:

Sample the latent prior: $x_0 \sim q(x)$ , or $x_0 = \phi_\omega(c) + \sigma_c z_0$ for conditional cases (with $z_0 \sim \mathcal{N}(0,I)$ and $c$ a conditioning variable).
Coarse deterministic transport: Compute $x_1 = x_0 + \hat{u}_{0,1}(x_0; \theta)$ , where $\hat{u}_{0,1}$ is the neural average velocity predictor.
Stochastic refinement (Noise Injection): Draw $\epsilon_2 \sim \mathcal{N}(0,I)$ and set $x_{\mathrm{tgt}} = x_1 + \sqrt{\sigma_{\min}^2 - \sigma^2} \, \epsilon_2.$

This two-stage process ensures that even with a single MeanFlow evaluation (1-NFE), RMFlow can overcome the mode-averaging and deterministic collapse inherent in the baseline MeanFlow paradigm.

3. Mathematical Formulation and Objective

3.1 Mean Velocity and Training

The RMFlow neural predictor $\hat{u}_{t,r}(x; \theta)$ is trained to approximate the average velocity:

$m_{t,r}(x) := \mathbb{E}_{(x_0, x_1): x_t = x}[ (x_r - x_t)/(r-t) ] \approx \hat{u}_{t,r}(x; \theta)$

Explicitly, for 1-NFE, $\hat{u}_{0,1}(x_0; \theta) \approx \mathbb{E}[x_1 - x_0 \mid x_0]$ .

3.2 Loss Function

RMFlow introduces a composite objective to jointly control Wasserstein transport and sample likelihood:

$\mathcal{L}_{\text{RMFlow}}(\theta, \omega) = \mathcal{L}_{\text{CFM}}(\theta) + \lambda_1 \mathcal{L}_{\text{NLL}}(\theta) + \lambda_2 R(c; \omega)$

Mean-Flow Matching (Wasserstein control): $\mathcal{L}_{\text{CFM}}(\theta)$ regresses the predicted mean velocity to a dynamical-consistent reference field, upper-bounding the squared $W_2$ distance between generated paths and target.
Negative Log-Likelihood: $\mathcal{L}_{\text{NLL}}(\theta)$ maximizes sample likelihood by minimizing the reconstruction error between the refined sample and the ground truth, leveraging the Gaussian likelihood induced by the noise-injection step.
Optional encoder regularization: $R(c; \omega)$ penalizes encoder norm to prevent collapse.

This joint loss avoids over-emphasizing either transport accuracy (which can lead to smooth, nonmultimodal samples) or sample likelihood (which can result in mode dropping).

4. Neural Architectures and Implementation

RMFlow adapts its architecture to the task modality:

Synthetic data (density estimation): 6-layer ResNet-like MLPs (hidden size 256, SiLU).
Context-to-molecule (QM9): Encoding via MLP→EGNN; mean-flow via 9-layer EGNN with time embedding; RL fine-tuning for reward-based stability.
Time-series (Lorenz, FitzHugh–Nagumo): Conditioning via MLP, generation via a modified U-Net backbone.
Text-to-image (COCO): Conditioning via a pretrained e5-base embedder and learned MLP; backbone uses Stable Diffusion’s 480M-parameter U-Net with time embedding and PEFT fine-tuning.

Hyperparameters are selected via task-specific ablation, with $\lambda_1 \in \{10^{-1}, 10^{-2}\}$ for synthetic and $\lambda_1 = 5 \times 10^{-2}$ for QM9, balancing Wasserstein and likelihood losses.

5. Empirical Results

Synthetic density estimation: On 1D mixtures and 2D checkerboard data, 1-NFE RMFlow achieves TV~0.76, KL~0.23, substantially exceeding the quality of 1/8-NFE MeanFlow and approaching 32-NFE MeanFlow, demonstrating that RMFlow’s stochastic refinement sharply improves mode coverage.

QM9 molecule generation: 1-NFE RMFlow with RL guidance achieves atomic stability 93.2% and molecular stability 93.5%, matching diffusion models that require orders-of-magnitude more function evaluations. Deterministic MeanFlow scores (without noise injection) plateau at 84.3% atomic stability and 79.3% molecular stability.

Time-series modeling: On Lorenz and FitzHugh–Nagumo systems, 1-NFE RMFlow achieves TV/KL close to or better than 8/32-NFE MeanFlow and approaches the performance of multi-step diffusion/FM models.

Text-to-image (COCO): RMFlow (1-NFE) achieves FID of 18.91, outperforming 1-step GANs and teacher-free ODE methods including MeanFlow (27.31 FID), and yielding competitive CLIP alignment (0.291) to prompts.

Task | Benchmark | 1-NFE RMFlow | Baseline MeanFlow | SOTA Diffusion (≫1 NFE) | |--------------------|--------------|-------------------|-------------------------| | QM9 Atom Stab. (%) | 93.2 | 84.3 | 93–97 | | QM9 Mol Stab. (%) | 93.5 | 79.3 | 90–94 | | COCO FID | 18.91 | 27.31 | 13.10–19.5 |

6. Analysis and Ablation

Noise-injection effect: Introducing the noise term $\sqrt{\sigma_{\min}^2 - \sigma^2} \, \epsilon_2$ after the coarse MeanFlow correction restores stochasticity and improves multimodal coverage, counteracting the deterministic path averaging of single-step MeanFlow.

Loss balance ( $\lambda_1$ ablation): The empirical optimum requires nontrivial weighting for the NLL term: overly small $\lambda_1$ produces over-smoothed distributions, while excessive $\lambda_1$ degrades likelihood.

Single vs multiple refinements: RMFlow currently applies a single noise-injection; extension to multiple alternations of coarse transport and noise injection is posed as a direction for further improvement, to progressively refine multimodal samples.

7. Significance, Limitations, and Future Directions

RMFlow demonstrates that a hybrid scheme—one deterministic MeanFlow pass followed by a targeted, controlled noise-injection—substantially improves multimodal sample quality while preserving the O(1-NFE) speed and computational cost of deterministic flows. This circumvents the curse of determinism for sharp, complex data distributions and offers a principled bridge to likelihood-based score methods.

The joint CFM+NLL loss framework guarantees both low $W_2$ error and high sample likelihood, achieving or nearing SOTA results on diverse tasks with a single forward pass and trivial overhead. The structure allows direct control of KL divergence between samples and targets.

Identified limitations include: reliance on the suitability of a single noise-injection step (multi-stage RMFlow is left for future work), fixed noise scale (adaptive scale tuning is proposed), and the need for more general-purpose encoder regularization.

Proposed advancements are: multi-step mean-flow architectures with interleaved noise-injection, learned or adaptive noise schedules, and exploration of encoder regularizations for various guidance scenarios.

RMFlow thus provides a rigorous, efficient approach for multimodal generation across domains and serves as a template for further research in single-pass flow-based generative modeling (Huang et al., 31 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RMFlow.