Unified Continuous Generative Models (2505.07447v2)

Published 12 May 2025 in cs.LG, cs.AI, and cs.CV

Abstract: Recent advances in continuous generative models, including multi-step approaches like diffusion and flow-matching (typically requiring 8-1000 sampling steps) and few-step methods such as consistency models (typically 1-8 steps), have demonstrated impressive generative performance. However, existing work often treats these approaches as distinct paradigms, resulting in separate training and sampling methodologies. We introduce a unified framework for training, sampling, and analyzing these models. Our implementation, the Unified Continuous Generative Models Trainer and Sampler (UCGM-{T,S}), achieves state-of-the-art (SOTA) performance. For example, on ImageNet 256x256 using a 675M diffusion transformer, UCGM-T trains a multi-step model achieving 1.30 FID in 20 steps and a few-step model reaching 1.42 FID in just 2 steps. Additionally, applying UCGM-S to a pre-trained model (previously 1.26 FID at 250 steps) improves performance to 1.06 FID in only 40 steps. Code is available at: https://github.com/LINs-lab/UCGM.

Summary

The paper introduces a unified framework (UCGM) that integrates diffusion, flow-matching, and consistency models using a flexible consistency ratio to transition between paradigms.
The methodology employs a Unified Trainer (UCGM-T) and Unified Sampler (UCGM-S) to optimize training and sampling efficiency, achieving an FID of 1.30 in 20 steps on ImageNet.
The framework’s self-boosting mechanism and reduced computational overhead highlight its potential to advance practical and theoretical generative modeling research.

Unified Continuous Generative Models

Introduction

The emergence of diffusion models, flow-matching models, and consistency models has marked a significant advancement in the domain of continuous generative models. These approaches have primarily been developed independently, each characterized by distinct training and sampling methods despite their shared goal of generating high-fidelity data. This paper introduces a novel framework, Unified Continuous Generative Models (UCGM), which aims to bridge these methodologies by providing a unified framework for training and sampling. UCGM showcases state-of-the-art performance across ImageNet datasets, elevating the capabilities of multi-step diffusion models and few-step consistency models alike.

Methodology

UCGM introduces a Unified Trainer (UCGM-T) and a Unified Sampler (UCGM-S) designed to integrate and enhance existing paradigms under a unified objective. The trainer incorporates a consistency ratio parameter ( $\lambda \in [0,1]$ ), enabling a transition between few-step models like consistency models and multi-step paradigms such as diffusion and flow-matching models. The training objective is flexible enough to accommodate various noise schedules without requiring specific alterations.

In parallel, UCGM-S is capable of optimizing sampling from models trained by UCGM-T and can be efficiently applied to pre-trained models developed with separate paradigms. The proposed self-boosting mechanisms significantly improve the training and sampling efficiency, reducing computational overhead and enhancing sample quality without the need for classifier-free guidance.

Experimental Results

UCGM demonstrates substantial improvements in sampling efficiency and fidelity. When applied to a $675 \text{M}$ diffusion transformer model trained on ImageNet $256\times256$ , UCGM-T achieves an FID of $1.30$ with only $20$ sampling steps. Notably, the application of UCGM-S to models pre-trained under previous paradigms achieved an FID of $1.06$ in just $40$ steps, a remarkable enhancement over traditional methodologies (Figure 1).

Figure 1: NFE~=40, FID~=1.48.

To highlight its versatility, UCGM was tested across varied lambda settings and sampling steps, demonstrating exceptional adaptability (Figure 2).

Figure 2: Various lambda and sampling steps.

Implications and Future Directions

The introduction of UCGM presents a significant paradigm shift in generative modeling, offering a cohesive approach that reconciles the strengths of diffusion, flow-matching, and consistency paradigms. Practically, this framework reduces the computational burden, making high-quality generative models more accessible. Theoretically, UCGM paves the way for more integrated research across model types, fostering innovations that leverage the fundamental similarities between these seemingly disparate methods.

Future work should explore further enhancements to UCGM's self-boosting techniques, as well as the potential for UCGM to integrate with emerging generative models beyond the current scope. Additionally, exploring the impact of alternative lambda schedules could provide deeper insights into the fundamental mechanics of model transitions within UCGM.

Conclusion

Through its innovative architecture, UCGM effectively consolidates various continuous generative modeling techniques, setting a new standard for efficiency and quality. This unified approach not only simplifies the training and deployment of generative models but also opens avenues for new research into adaptive generative systems. UCGM stands as a crucial development in the ongoing evolution of AI-based data generation.