Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion (2310.02279v3)

Published 1 Oct 2023 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encompassing CM and score-based models as special cases. CTM trains a single neural network that can -- in a single forward pass -- output scores (i.e., gradients of log-density) and enables unrestricted traversal between any initial and final time along the Probability Flow Ordinary Differential Equation (ODE) in a diffusion process. CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance and achieves new state-of-the-art FIDs for single-step diffusion model sampling on CIFAR-10 (FID 1.73) and ImageNet at 64x64 resolution (FID 1.92). CTM also enables a new family of sampling schemes, both deterministic and stochastic, involving long jumps along the ODE solution trajectories. It consistently improves sample quality as computational budgets increase, avoiding the degradation seen in CM. Furthermore, unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods from the diffusion community. This access also enables the computation of likelihood. The code is available at https://github.com/sony/ctm.

Citations (120)

View on Semantic Scholar

Summary

The paper introduces CTM, a unified framework that integrates score-based and distillation models to capture both fine gradients and broader trajectory shifts in diffusion ODEs.
The paper leverages a novel training strategy combining score matching, reconstruction, and adversarial losses to directly learn detailed data trajectories.
The paper demonstrates state-of-the-art performance in image generation on CIFAR-10 and ImageNet, establishing CTM's practical impact in generative modeling.

Consistency Trajectory Models: Bridging Score-based and Distillation Models for Efficient Diffusion

Introduction

Generative models, especially Diffusion Models (DMs), have achieved remarkable success in generating high-quality samples. These models, however, face challenges regarding sample generation speed and computational efficiency. A new approach, Consistency Trajectory Models (CTM), proposes a solution by unifying score-based and distillation models under a common framework. CTM excels in providing a generalized method that benefits from both the precise generative process control of score-based models and the enhanced sampling efficiency of distillation models.

CTM Framework

Unification of Models

CTM introduces a novel parameterization to model both infinitesimal changes and significant trajectory shifts in the probability flow Ordinary Differential Equation (ODE) for diffusion processes. This dual capability allows for capturing both detailed gradient information and broad trend changes in data generation, providing flexibility in application across various domains. Notably, CTM achieves state-of-the-art (SOTA) performance in image generation tasks on CIFAR-10 and ImageNet 64x64, demonstrating its effectiveness in practice.

Training Approach

CTM's training strategy is unique in integrating score matching with reconstruction and adversarial losses, aiming at a comprehensive learning process. Its structure facilitates direct learning of data trajectories, significantly enhancing model performance while ensuring computational efficiency. Furthermore, an innovative sampling method, termed γ-sampling, emerges from CTM’s framework, supporting both deterministic and stochastic generation paths with controlled variance.

Performance and Implications

The introduction of CTM marks a significant advancement in generative modeling, particularly in the field of DMs. With its dual modeling capability, CTM not only improves sampling efficiency but also provides a unified perspective on score-based and distillation models. The achieved SOTA results in terms of both density estimation and image generation underscore the model's potential impact. CTM paves the way for future research in generative AI, suggesting paths for further optimization and new model architectures. Moreover, the applicability of CTM across various domains hints at its broad implications, from enhancing generative quality in AI art to accelerating simulation processes in scientific research.

Future Directions

Looking ahead, several areas warrant further exploration. The flexibility of CTM in adapting to different domains suggests potential for application beyond image and media generation, including LLMs and time-series prediction. Additionally, the insightful blending of adversarial training within the CTM framework sparks curiosity about the integration of other deep learning methodologies to enrich and expand the model's capabilities. As the field of AI continues to evolve, CTM represents a significant step forward, offering a robust template for the next generation of generative models.

CTMs exhibit a promising avenue in the evolution of generative models, demonstrating both efficiency in learning and flexibility in application. As we advance, the exploration of CTM's boundaries and potential applications will undoubtedly yield further insights and breakthroughs in the field of artificial intelligence.

PDF Markdown

Related Papers

GitHub

GitHub - sony/ctm (212 stars)

Tweets

https://twitter.com/gimdong58085414/status/1772350285270188069

https://twitter.com/HannesStaerk/status/1759415994676642223

https://twitter.com/mittu1204/status/1747900374001025265

https://twitter.com/FrancoisRozet/status/1924737421737906246

https://twitter.com/YouJiacheng/status/1847321784049422441

https://twitter.com/YouJiacheng/status/1847313280878153915

YouTube

Show All Videos