Papers
Topics
Authors
Recent
2000 character limit reached

From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting

Published 26 Nov 2025 in cs.CV and cs.LG | (2511.21215v1)

Abstract: We present a comprehensive comparative study of three generative modeling paradigms: Denoising Diffusion Probabilistic Models (DDPM), Conditional Flow Matching (CFM), and MeanFlow. While DDPM and CFM require iterative sampling, MeanFlow enables direct one-step generation by modeling the average velocity over time intervals. We implement all three methods using a unified TinyUNet architecture (<1.5M parameters) on CIFAR-10, demonstrating that CFM achieves an FID of 24.15 with 50 steps, significantly outperforming DDPM (FID 402.98). MeanFlow achieves FID 29.15 with single-step sampling -- a 50X reduction in inference time. We further extend CFM to image inpainting, implementing mask-guided sampling with four mask types (center, random bbox, irregular, half). Our fine-tuned inpainting model achieves substantial improvements: PSNR increases from 4.95 to 8.57 dB on center masks (+73%), and SSIM improves from 0.289 to 0.418 (+45%), demonstrating the effectiveness of inpainting-aware training.

Summary

  • The paper presents a comparative analysis demonstrating that CFM achieves superior image synthesis with an FID of 24.15 compared to DDPM's 402.98.
  • The paper details MeanFlow's one-step generation, delivering near-real-time image synthesis with an FID of 29.15 despite a slight fidelity trade-off.
  • The paper applies a fine-tuning strategy to CFM for image inpainting, markedly enhancing PSNR and reducing NMSE for improved boundary harmonization.

Comparative Study of Flow-Based Generative Models

Abstract

This article reviews the paper titled "From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting" (2511.21215), which provides a thorough comparison of three generative modeling paradigms applied to image synthesis and inpainting tasks. The study encompasses the performance of Denoising Diffusion Probabilistic Models (DDPM), Conditional Flow Matching (CFM), and MeanFlow, each implemented using a streamlined TinyUNet architecture. The paper illustrates the superiority of CFM over DDPM and demonstrates MeanFlow's efficiency in one-step generation, with an exploration of CFM's application to inpainting using mask-guided sampling.

Introduction

The study addresses the ongoing enhancements in generative modeling, particularly through diffusion-based methods like DDPM and flow matching techniques such as CFM. These approaches have advanced image synthesis capabilities, albeit typically requiring numerous sampling steps for inference.

CFM achieves significant performance improvements, attaining a Fréchet Inception Distance (FID) of 24.15, a 16.7-fold enhancement over DDPM, which only reaches an FID of 402.98. Conversely, MeanFlow introduces a radical efficiency in sampling, producing images in a single step with an FID of 29.15. Figure 1

Figure 1: Overall FID and KID comparison across three methods. CFM and MeanFlow significantly outperform DDPM, with CFM achieving the best scores.

Generative Paradigms

Denoising Diffusion Probabilistic Models (DDPM)

DDPM employs a framework reversing a noise addition process iteratively, though it becomes computationally intensive during generation. The experiments conducted demonstrate that DDPM achieves a notably high FID of 402.98. This is attributed to limitations in timesteps and architectural constraints of TinyUNet. Figure 2

Figure 2: DDPM samples at epoch 399. Despite extended training (400 epochs), the model fails to generate coherent images, producing only noise-like patterns. This explains the poor FID of 402.98.

Conditional Flow Matching (CFM)

CFM employs a linear transport interpolation strategy, learning a constant velocity field that achieves faster convergence. CFM demonstrates superior image synthesis with an FID of 24.15—considerably outperforming DDPM.

MeanFlow

Capitalizing on accelerated sampling, MeanFlow models the average velocity, facilitating single-step generation. Despite sacrificing minor fidelity, MeanFlow presents benefits in inference speed crucial for real-time applications. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: MeanFlow: Ship

Image Inpainting with CFM

Mask Types and Inpainting Strategy

The paper introduces mask-guided image inpainting using a fine-tuned CFM model, leveraging strategies for boundary harmonization. The authors employ four mask types: center, random bounding box, irregular brush strokes, and half image masks.

Fine-Tuning Success

A focused fine-tuning strategy yields marked improvements in inpainting performance, illustrated by an average PSNR increase of 74.2% and substantial gains in NMSE and SSIM, suggesting improved boundary harmonization and structural coherence. Figure 4

Figure 4: Inpainting performance comparison: Base CFM (red) vs Fine-tuned (green) across all mask types. Fine-tuning dramatically improves all metrics---reducing NMSE by 52--60\% and improving PSNR by 55--86\%.

Experimental Findings

Performance Metrics

The paper systematically evaluates generative performance across different paradigms using CIFAR-10. CFM emerges as the most effective, confirmed by its superior FID and KID metrics and robust performance across varied classes and mask types.

Qualitative Results

Qualitative analyses reaffirm CFM's proficiency in generating class-specific structures and demonstrate MeanFlow's groundbreaking efficiency in synthesis, albeit with slight compromises in detail.

Discussion

DDPM Limitations

The technical limitations inherent to DDPM manifest in its inability to produce coherent images efficiently, indicative of architectural and process-related incapabilities.

Trade-offs Between CFM and MeanFlow

The study highlights a trade-off between fidelity and efficiency, where CFM excels in quality but requires more computational resources compared to MeanFlow's fast but modestly less detailed outputs.

Contributions to Inpainting Tasks

Fine-tuning critically enhances image inpainting, suggesting pathways for more sophisticated model training protocols to handle complex missing data reconstruction tasks.

Conclusion

The paper substantiates the proficiency of CFM and MeanFlow over DDPM in generative tasks, notably in speed and inpainting capability. These results imply potential for further advancements in real-time image synthesis and inpainting applications, encouraging exploration into scaling methodologies and combining paradigms for enhanced resolution capabilities.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.