Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling (2510.24474v1)

Published 28 Oct 2025 in cs.CV

Abstract: Denoising generative models, such as diffusion and flow-based models, produce high-quality samples but require many denoising steps due to discretization error. Flow maps, which estimate the average velocity between timesteps, mitigate this error and enable faster sampling. However, their training typically demands architectural changes that limit compatibility with pretrained flow models. We introduce Decoupled MeanFlow, a simple decoding strategy that converts flow models into flow map models without architectural modifications. Our method conditions the final blocks of diffusion transformers on the subsequent timestep, allowing pretrained flow models to be directly repurposed as flow maps. Combined with enhanced training techniques, this design enables high-quality generation in as few as 1 to 4 steps. Notably, we find that training flow models and subsequently converting them is more efficient and effective than training flow maps from scratch. On ImageNet 256x256 and 512x512, our models attain 1-step FID of 2.16 and 2.12, respectively, surpassing prior art by a large margin. Furthermore, we achieve FID of 1.51 and 1.68 when increasing the steps to 4, which nearly matches the performance of flow models while delivering over 100x faster inference.

Summary

The paper introduces Decoupled MeanFlow, a technique that transforms flow models into flow map models to reduce discretization errors and enhance sampling speed.
The approach decouples the encoder and decoder, allowing pretrained models to be fine-tuned without altering their architecture.
Experimental results show significant improvements, with a 1-step FID score of 2.16 and competitive 4-step performance on ImageNet.

Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling

Abstract

The research highlights a novel approach named Decoupled MeanFlow (DMF) aimed at improving sampling efficiency in denoising generative models, notably diffusion and flow-based models. These models, while adept at generating high-quality samples, traditionally require numerous denoising steps leading to inefficiencies primarily due to discretization errors. Flow maps come as a solution to estimate the average velocity between timesteps, thus alleviating discretization errors and enabling faster sampling. This paper introduces a decoding strategy that helps transition flow models into flow map models without necessitating architectural alterations. This method streamlines the integration of pretrained flow models and pushes the boundaries of generative efficiency, outperforming previous works with staggering speed increases.

Figure 1: Accelerating diffusion transformer via Decoupled MeanFlow.

Introduction

Diffusion models and flow models have rapidly emerged as leading techniques for producing high-quality visual content, significantly impacting image and video generation fields. However, the requirement for multiple complex iterations remains a bottleneck, challenging prospects for real-time applications. Consistent models, and now flow maps, offer promising alternatives by reducing the requisite sampling steps. While the MeanFlow technique has demonstrated these capabilities, this research introduces DMF as a method that leverages existing flow models for flow mapping, enhancing both the architectural simplicity and functional performance of the models.

Figure 2: Qualitative examples.

Methodology

The DMF approach innovatively separates the encoder and decoder of flow models, thereby creating a flexible framework which allows pretrained flow models to utilize new flow maps effectively. By doing so, the process ensures that the decoder can now notably focus on future timestep predictions, permitting higher precision in depicting subsequent states of the model. This intrinsic decoupling ensures that the representation of information by the encoder does not redundantly integrate subsequent timestep data, thereby optimizing the use of architectural resources.

Implementation

The DMF leverages existing pretrained models and facilitates their conversion via decoder-centric fine-tuning. It features robust techniques like Flow Matching warmup and adaptive loss functions, promoting efficient utilization of computational resources. This definitive approach not only scales seamlessly across model resolutions but also supports staggeringly fast inference capability.

Figure 3: Varying depth in encoder-decoder configurations and their impact on performance.

Results

Demonstrating substantial improvements in image generation benchmarks, the DMF model achieves an unprecedented 1-step FID score of 2.16 on ImageNet 256×256, which signifies a major leap over existing models. At 4 steps, it reaches an FID of 1.51, achieving competence comparable to flow models but with significantly reduced computations. The results confirm DMF's efficacy, emphasizing its enhanced representation capabilities and matched performance against models that traditionally required extensive computation.

Conclusion

Decoupled MeanFlow sets a robust precedent for the future of efficient model architecture in AI-generated content. While primarily a theoretical advancement, DMF advocates a pragmatic approach to model reusability and efficient sample generation, showcasing potential impacts across various applications where rapid sample generation is crucial. As the need for efficient generative processes grows, DMF's simplicity and effectiveness will likely lead to broader research and substantial optimizations in architecture and inference strategies.