Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis (2404.13686v3)

Published 21 Apr 2024 in cs.CV

Abstract: Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.

Citations (39)

Summary

  • The paper introduces Hyper-SD, a novel framework that enhances diffusion models via trajectory segmented consistency distillation.
  • It integrates human feedback learning and score distillation to achieve high-quality image generation using only 1 to 8 inference steps.
  • Experimental results show state-of-the-art performance in aesthetic quality and textual fidelity, validated by metrics and user preference studies.

Enhancing Diffusion Model Step Efficiency through Hyper-SD, a Novel Distillation Framework

Overview of Hyper-SD

Hyper-SD introduces a novel approach that amalgamates both trajectory-preservation and trajectory-reformulation techniques within diffusion models (DMs). This unified framework leverages trajectory segmented consistency distillation (TSCD), human feedback learning, and score distillation to achieve state-of-the-art (SOTA) performances on stable-diffusion models like SDXL and SD1.5 over a reduced number of inference steps, ranging from 1 to 8.

Methodology

Hyper-SD's methodology centers on three primary enhancements to the diffusion model distillation process:

  1. Trajectory Segmented Consistency Distillation (TSCD):
    • The proposed TSCD divides the diffusion trajectory into smaller segments, facilitating a more granular and effective distillation process.
    • This approach minimizes model fitting complexity, mitigating the degradation in generation quality and preserving the fidelity of the original model's trajectory across various segments.
  2. Human Feedback Learning:
    • This involves adjusting model outputs based on human aesthetic preferences and the feedback from visual perceptual models to improve the generation quality,
    • The implementation uses aesthetic predictors and instance segmentation models to refine structure and aesthetic appeal, guiding the model toward producing visually pleasing and structurally coherent outputs.
  3. Score Distillation for One-step Generation Enhancement:
    • Incorporates a Distribution Matching Distillation (DMD) technique targeting enhancements specifically for one-step inference, optimizing the estimation of the score function and thus improving generation quality from minimal inference steps.

Experimental Results

Extensive experiments and a user paper were conducted, showing that Hyper-SD achieves superior performance in both aesthetic quality and textual fidelity across different diffusion model architectures:

  • Metrics Utilized: CLIP Score, Aesthetic Score, and specialized metrics such as ImageReward and Pickscore were used to quantitatively assess performance.
  • Comparison to Baselines: Hyper-SD displayed noticeable improvements over existing methods like SDXL-Lightning and various adversarial and trajectory-based distillation techniques.
  • User Study Findings: Hyper-SD was preferred significantly more often compared to other methods, reinforcing the effectiveness of the proposed enhancements.

Implications and Future Work

The practical implications of Hyper-SD are profound for real-world applications requiring efficient and high-quality image generation from textual prompts. The ability to operate effectively across a reduced number of inference steps without compromising output quality can lead to more resource-efficient deployments of generative models.

Looking ahead, future developments might focus on:

  • Maintaining Classifier Free Guidance (CFG): Ensuring the model can utilize negative prompts effectively while still functioning under accelerated conditions.
  • Custom Feedback Optimization: Tailoring feedback learning mechanisms specifically for accelerated models to enhance performance further.

Conclusion

Hyper-SD marks a significant advance in the field of generative AI, particularly in the optimization of diffusion models for fewer-step inference with high fidelity and aesthetic quality. It sets a new standard for efficiency in model performance, paving the way for both academic exploration and practical applications in AI-driven image generation.

Youtube Logo Streamline Icon: https://streamlinehq.com