Classifier-Free Diffusion Guidance (2207.12598v1)

Published 26 Jul 2022 in cs.LG and cs.AI

Abstract: Classifier guidance is a recently introduced method to trade off mode coverage and sample fidelity in conditional diffusion models post training, in the same spirit as low temperature sampling or truncation in other types of generative models. Classifier guidance combines the score estimate of a diffusion model with the gradient of an image classifier and thereby requires training an image classifier separate from the diffusion model. It also raises the question of whether guidance can be performed without a classifier. We show that guidance can be indeed performed by a pure generative model without such a classifier: in what we call classifier-free guidance, we jointly train a conditional and an unconditional diffusion model, and we combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance.

PDF Abstract

Classifier-Free Diffusion Guidance: Enhancements in Generative Models Through Joint Training

The paper "Classifier-Free Diffusion Guidance" by Jonathan Ho and Tim Salimans presents an innovative approach that eliminates the need for a classifier in guiding diffusion models for image generation. This method, referred to as classifier-free guidance, promises to simplify the generative model pipeline while achieving similar trade-offs between sample quality and diversity, as seen with previously known classifier guidance methods.

Diffusion models have gained significant traction in generative modeling due to their capacity to deliver high-quality samples and likelihood scores in tasks involving image and audio synthesis. These models have shown competitive performance against state-of-the-art alternatives such as BigGAN-deep and VQ-VAE-2, especially in the field of ImageNet image generation. Early works, such as those by Sohl-Dickstein et al. and Kingma et al., laid the groundwork for the evolution of these models.

Background and Motivation

Previous research introduced classifier guidance, a method that augments the sample generation process by incorporating the gradient of an auxiliary trained classifier into the diffusion model's score estimate. This combination allowed researchers to control the trade-off between sample fidelity and mode coverage, effectively producing high Inception Scores (IS) and better Fréchet Inception Distance (FID) scores. However, this approach necessitated training an additional classifier, complicating the model pipeline and raising questions about its adversarial nature against classifier metrics.

In contrast, classifier-free guidance aims to address these limitations by leveraging a pure generative model without relying on auxiliary classifiers. This is achieved by jointly training a conditional and an unconditional diffusion model, enabling the combination of their respective score estimates during sampling. The core idea is to merge the strengths of both models to guide the generation process effectively, thus maintaining high sample quality and improved metrics such as FID and IS.

Technical Approach

The training process for classifier-free guidance involves:

Joint Training of Models:
- Conditional and unconditional diffusion models are trained together. The conditional model is trained with class labels, while the unconditional model discards these labels with a certain probability (p_uncond).
- This approach ensures that the model learns both conditional and unconditional score estimates effectively.
Score Combination:
- During sampling, the score estimates from the conditional (s_theta(z_lambda, c)) and unconditional (s_theta(z_lambda)) models are combined using a specific formula:
$\tilde{s}_\theta(z_\lambda, c) = (1+w)s_\theta(z_\lambda, c) - ws_\theta(z_\lambda)$

This linear combination achieves the intended guidance without requiring classifier gradients, thus simplifying the training pipeline.

Experimental Validation

The experiments were conducted on class-conditional ImageNet datasets at resolutions of 64x64 and 128x128. The paper varied the guidance strength (w) and evaluated the models based on FID and IS metrics. Key findings include:

Trade-Offs Between FID and IS:
- The results demonstrated a clear trade-off between FID and IS, similar to what is observed in classifier-guided models and GAN-based models.
- The best FID scores were achieved with a small amount of guidance (w = 0.1 or w = 0.3), whereas the highest IS required stronger guidance (w ≥ 4).
Sample Quality and Diversity:
- Visual inspections of generated samples indicated a decrease in diversity but an increase in individual sample fidelity as guidance strength increased.
Efficiency Considerations:
- Classifier-free guidance involves two forward passes during sampling, which might be less efficient than classifier-guided methods if classifier models are smaller and faster.

Implications and Future Work

The classifier-free guidance model offers several practical and theoretical advantages:

Simplicity in Implementation:
- The approach requires minimal changes to the existing model training and sampling procedures, making it accessible for broader applications.
Avoidance of Classifier Dependence:
- By not relying on classifier gradients, the method avoids potential adversarial influences on classifier-based metrics, thereby providing a more genuine measurement of sample quality.
Potential Extensions:
- Future work may explore optimizing model architectures to reduce the inefficiency caused by dual forward passes. Additionally, ensuring sample diversity while enhancing fidelity remains an open challenge with significant implications for application fairness and representativity.

In conclusion, classifier-free guidance represents a significant step towards efficient and simplified generative modeling. By relying solely on generative models without auxiliary classifiers, it achieves a balance between sample quality and diversity, demonstrating the robust capabilities of diffusion models in image synthesis. Future research is anticipated to further refine this approach and expand its applicability across various data modalities.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Jonathan Ho (27 papers)
Tim Salimans (46 papers)

Citations (2,899)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/pcastr/status/1778143501386990003

YouTube

Show All Videos