- The paper presents a comparative analysis demonstrating that CFM achieves superior image synthesis with an FID of 24.15 compared to DDPM's 402.98.
- The paper details MeanFlow's one-step generation, delivering near-real-time image synthesis with an FID of 29.15 despite a slight fidelity trade-off.
- The paper applies a fine-tuning strategy to CFM for image inpainting, markedly enhancing PSNR and reducing NMSE for improved boundary harmonization.
Comparative Study of Flow-Based Generative Models
Abstract
This article reviews the paper titled "From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting" (2511.21215), which provides a thorough comparison of three generative modeling paradigms applied to image synthesis and inpainting tasks. The study encompasses the performance of Denoising Diffusion Probabilistic Models (DDPM), Conditional Flow Matching (CFM), and MeanFlow, each implemented using a streamlined TinyUNet architecture. The paper illustrates the superiority of CFM over DDPM and demonstrates MeanFlow's efficiency in one-step generation, with an exploration of CFM's application to inpainting using mask-guided sampling.
Introduction
The study addresses the ongoing enhancements in generative modeling, particularly through diffusion-based methods like DDPM and flow matching techniques such as CFM. These approaches have advanced image synthesis capabilities, albeit typically requiring numerous sampling steps for inference.
CFM achieves significant performance improvements, attaining a Fréchet Inception Distance (FID) of 24.15, a 16.7-fold enhancement over DDPM, which only reaches an FID of 402.98. Conversely, MeanFlow introduces a radical efficiency in sampling, producing images in a single step with an FID of 29.15.
Figure 1: Overall FID and KID comparison across three methods. CFM and MeanFlow significantly outperform DDPM, with CFM achieving the best scores.
Generative Paradigms
Denoising Diffusion Probabilistic Models (DDPM)
DDPM employs a framework reversing a noise addition process iteratively, though it becomes computationally intensive during generation. The experiments conducted demonstrate that DDPM achieves a notably high FID of 402.98. This is attributed to limitations in timesteps and architectural constraints of TinyUNet.
Figure 2: DDPM samples at epoch 399. Despite extended training (400 epochs), the model fails to generate coherent images, producing only noise-like patterns. This explains the poor FID of 402.98.
Conditional Flow Matching (CFM)
CFM employs a linear transport interpolation strategy, learning a constant velocity field that achieves faster convergence. CFM demonstrates superior image synthesis with an FID of 24.15—considerably outperforming DDPM.
MeanFlow
Capitalizing on accelerated sampling, MeanFlow models the average velocity, facilitating single-step generation. Despite sacrificing minor fidelity, MeanFlow presents benefits in inference speed crucial for real-time applications.



Figure 3: MeanFlow: Ship
Image Inpainting with CFM
Mask Types and Inpainting Strategy
The paper introduces mask-guided image inpainting using a fine-tuned CFM model, leveraging strategies for boundary harmonization. The authors employ four mask types: center, random bounding box, irregular brush strokes, and half image masks.
Fine-Tuning Success
A focused fine-tuning strategy yields marked improvements in inpainting performance, illustrated by an average PSNR increase of 74.2% and substantial gains in NMSE and SSIM, suggesting improved boundary harmonization and structural coherence.
Figure 4: Inpainting performance comparison: Base CFM (red) vs Fine-tuned (green) across all mask types. Fine-tuning dramatically improves all metrics---reducing NMSE by 52--60\% and improving PSNR by 55--86\%.
Experimental Findings
The paper systematically evaluates generative performance across different paradigms using CIFAR-10. CFM emerges as the most effective, confirmed by its superior FID and KID metrics and robust performance across varied classes and mask types.
Qualitative Results
Qualitative analyses reaffirm CFM's proficiency in generating class-specific structures and demonstrate MeanFlow's groundbreaking efficiency in synthesis, albeit with slight compromises in detail.
Discussion
DDPM Limitations
The technical limitations inherent to DDPM manifest in its inability to produce coherent images efficiently, indicative of architectural and process-related incapabilities.
Trade-offs Between CFM and MeanFlow
The study highlights a trade-off between fidelity and efficiency, where CFM excels in quality but requires more computational resources compared to MeanFlow's fast but modestly less detailed outputs.
Contributions to Inpainting Tasks
Fine-tuning critically enhances image inpainting, suggesting pathways for more sophisticated model training protocols to handle complex missing data reconstruction tasks.
Conclusion
The paper substantiates the proficiency of CFM and MeanFlow over DDPM in generative tasks, notably in speed and inpainting capability. These results imply potential for further advancements in real-time image synthesis and inpainting applications, encouraging exploration into scaling methodologies and combining paradigms for enhanced resolution capabilities.