Efficient Diffusion Training via Min-SNR Weighting Strategy (2303.09556v3)

Published 16 Mar 2023 in cs.CV

Abstract: Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence. In this paper, we discovered that the slow convergence is partly due to conflicting optimization directions between timesteps. To address this issue, we treat the diffusion training as a multi-task learning problem, and introduce a simple yet effective approach referred to as Min-SNR-$\gamma$. This method adapts loss weights of timesteps based on clamped signal-to-noise ratios, which effectively balances the conflicts among timesteps. Our results demonstrate a significant improvement in converging speed, 3.4$\times$ faster than previous weighting strategies. It is also more effective, achieving a new record FID score of 2.06 on the ImageNet $256\times256$ benchmark using smaller architectures than that employed in previous state-of-the-art. The code is available at https://github.com/TiankaiHang/Min-SNR-Diffusion-Training.

Citations (115)

View on Semantic Scholar

Summary

The paper presents a novel Min-SNR-γ loss weighting strategy that accelerates convergence by balancing task-specific conflicts in diffusion models.
It reinterprets diffusion training as a multi-task learning problem, achieving a 3.4× speedup and improved performance on the ImageNet 256×256 benchmark.
The method provides a practical solution for efficient model training and scalability, making advanced generative modeling more accessible in resource-constrained settings.

Analysis of "Efficient Diffusion Training via Min-SNR Weighting Strategy"

The paper entitled "Efficient Diffusion Training via Min-SNR Weighting Strategy" addresses the challenge of slow convergence in denoising diffusion models, a prevalent architecture in deep generative models known for superior modeling of complex distributions in various domains. The authors propose a novel loss weighting strategy termed Min-SNR- $\gamma$ , to enhance convergence speed and improve model performance, highlighted by achieving a state-of-the-art FID score on the ImageNet $256\times256$ benchmark.

Background and Motivation

Denoising diffusion models have emerged as potent alternatives to Generative Adversarial Networks (GANs) due to their ability to handle diverse tasks, including text-to-image, video synthesis, text generation, and more. Despite their success, these models are hampered by slow training processes, necessitating significant computational resources. The authors attribute this inefficiency partly to conflicting optimization directions across different timesteps within the diffusion process.

Core Contribution: The Min-SNR- $\gamma$ Strategy

The authors reframe diffusion training as a multi-task learning problem, treating each timestep of the diffusion process as an individual task. They propose a weighting strategy that adapts the loss weights based on clamped signal-to-noise ratios (SNR), which is designed to balance the task-specific conflicts. This Min-SNR- $\gamma$ approach aims to mitigate these inconsistencies by assigning loss weights according to task difficulty, thus enabling swifter convergence and reduced training costs.

Methodological Insights

The paper posits that traditional SNR-based loss weighting excessively favors low-noise timesteps, leading to inefficient optimization. By modifying the weighting scheme to a predefined, non-adaptive global strategy, Min-SNR- $\gamma$ circumvents the computational burdens and instability associated with dynamically computed weights from Pareto optimization solutions. Moreover, by making the weights a function of SNR values truncated by a threshold $\gamma$ , the strategy ensures a balanced allocation of learning focus across noise levels.

Experimental Validation

The paper provides robust experimental validation across multiple setups and prediction targets—whether predicting noise or original input—and on different architectures such as UNet and Vision Transformers (ViT). The results signify a remarkable improvement in convergence speeds, with a reported $3.4\times$ acceleration over existing methodologies, and the establishment of a new record FID score on ImageNet $256\times256$ . This performance is maintained across various hyperparameter settings, particularly for the truncation parameter $\gamma$ , highlighting the strategy's robustness.

Implications and Future Directions

The Min-SNR- $\gamma$ strategy not only offers theoretical insights into the optimization landscape of diffusion models but also presents a practical solution to one of the key challenges in generative model training. By achieving rapid convergence without compromising on performance, the technique facilitates broader experimentation and application of diffusion models in resource-constrained scenarios. Future work could extend this strategy to other domains where multi-task learning dynamics are prevalent, potentially leading to enhanced model efficiencies in fields beyond image generation.

In summary, the paper makes a significant contribution by addressing diffusion training inefficiencies with a strategically designed loss weighting mechanism. The proposed Min-SNR- $\gamma$ strategy stands as a testament to the paper’s targeted approach towards optimizing the diffusion modeling paradigm, ensuring its applicability and scalability for various complex generative tasks.

Related Papers

GitHub

GitHub - TiankaiHang/Min-SNR-Diffusion-Training: [ICCV 2023] Efficient Diffusion Training via Min-SNR Weighting Strategy (227 stars)

Tweets

https://twitter.com/sameQCU/status/1848447375536243132

https://twitter.com/mnslarcher/status/1830694718919057501

https://twitter.com/DesaiXie/status/1859043329146868007