How to build a consistency model: Learning flow maps via self-distillation (2505.18825v1)

Published 24 May 2025 in cs.LG and cs.CV

Abstract: Building on the framework proposed in Boffi et al. (2024), we present a systematic approach for learning flow maps associated with flow and diffusion models. Flow map-based models, commonly known as consistency models, encompass recent efforts to improve the efficiency of generative models based on solutions to differential equations. By exploiting a relationship between the velocity field underlying a continuous-time flow and the instantaneous rate of change of the flow map, we show how to convert existing distillation schemes into direct training algorithms via self-distillation, eliminating the need for pre-trained models. We empirically evaluate several instantiations of our framework, finding that high-dimensional tasks like image synthesis benefit from objective functions that avoid temporal and spatial derivatives of the flow map, while lower-dimensional tasks can benefit from objectives incorporating higher-order derivatives to capture sharp features.

Summary

Overview of Consistency Models and Self-Distillation Framework

The paper presents a comprehensive approach to learning flow maps in consistency models utilizing a novel self-distillation framework. The focus is on enhancing the sampling efficiency of generative models, particularly in high-dimensional settings like image synthesis, by addressing existing limitations in traditional distillation schemes. The authors propose a direct training mechanism that leverages characterizations of flow maps to obviate pre-trained models. This method aims for an integrative framework, balancing the accessibility of flow-based training with the robust efficiency of distillation paradigms.

Background and Motivations

Generative models have shown unprecedented performance in domains ranging from computer vision to scientific applications by leveraging flow-based and diffusion models. Nonetheless, these techniques often require computationally burdensome evaluations of differential equations to simulate data, causing latency issues in real-time applications. Consistency models estimate flow maps directly to enhance sample generation processes. Traditional distillation approaches need pre-trained models, introducing artificial bottlenecks that this paper seeks to circumvent via self-distillation.

Methodology

The research introduces a framework that repurposes classical distillation methods through self-distillation, converting them into algorithms that train flow maps directly. The paper extensively details three equivalent characterizations of flow maps—Eulerian, Lagrangian, and semigroup conditions. These are harnessed to create novel training paradigms for flow maps without reliance on pre-trained models. Key innovations include:

Self-Distillation: With the implementation of the Eulerian, Lagrangian, and semigroup conditions in distillation schemes, the authors transition existing methodologies to self-distillation—a sole, continually-learning paradigm.
Algorithmic Frameworks: They propose three specific self-distillation algorithms: Eulerian (ESD), Lagrangian (LSD), and Progressive Self-Distillation (PSD), each correlating with a unique characterization of flow maps. These not only ensure optimal outcomes but yield theoretical error bounds on model accuracy via 2-Wasserstein error metrics.
Numerical Analysis: The authors employ and assess instantiations of the framework under various conditions, highlighting the advantages of avoiding high-order derivatives in objective functions, which enhance training performance particularly in image generation tasks.

Results and Implications

Empirical evaluations demonstrate competitive results of the self-distilling models, especially in scenarios requiring fewer steps in generative sample prediction. Self-distillation methodologies produce reliable high-dimensional outputs while reducing gradient variance, inherently improving training stability. The results underscore:

Efficiency: The training algorithms gain computational efficiency by simplifying generative processes, eliminating redundant pre-training phases, and fostering faster sampling.
Scalability: As shown in comparative experiments on datasets like CIFAR-10 and checker datasets, the approaches robustly handle dimensionality variations.
Flexibility: Their framework can be effortlessly adapted across various modalities and tasks, implying broader applications in AI-driven synthesis and real-time modeling, enabling further scholarly exploration and practical deployments.

Future Directions

The notable impact of self-distillation on computational efficiency renders it a promising direction for future research and application in generative AI. Prospective advancements could involve refining error bounds for progressive self-distillation, optimizing network architectures to bolster stability for other techniques, and extending application realms to interactive systems for real-time AI interactions.

In summary, this paper lays the groundwork for transforming current generative model training procedures by pioneering a consistency model framework anchored in self-distillation. The insights and methodologies discussed offer steps toward transcending existing computational thresholds, heralding advances in generative AI modeling.