Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models (2304.12526v2)

Published 25 Apr 2023 in cs.CV and cs.LG

Abstract: Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve outstanding FID scores in line with state-of-the-art benchmarks: 1.77 on CelebA-64$\times$64, 1.93 on AFHQv2-Wild-64$\times$64, and 2.72 on ImageNet-256$\times$256. We share our code and pre-trained models at https://github.com/Zhendong-Wang/Patch-Diffusion.

References (67)

Authors (8)

Zhendong Wang (60 papers)
Yifan Jiang (80 papers)
Huangjie Zheng (34 papers)
Peihao Wang (43 papers)
Pengcheng He (60 papers)
Zhangyang Wang (375 papers)
Weizhu Chen (128 papers)
Mingyuan Zhou (161 papers)

Citations (67)

View on Semantic Scholar

Summary

The paper introduces a conditional score function at the patch level, significantly accelerating training and boosting data efficiency.
It uses randomized and progressive patch sizes with pixel-level coordinates to capture multi-scale dependencies during training.
Experimental results demonstrate over twofold faster training and competitive FID scores on datasets such as CelebA, FFHQ, and AFHQv2.

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

In the paper "Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models," the authors introduce Patch Diffusion, a framework aimed at improving the training efficiency of diffusion models, a subset of generative models known for their powerful capabilities but also their extensive resource requirements. The central innovation lies in a patch-based training approach that reduces computational burdens and enhances data efficiency, thereby democratizing the use and development of diffusion models within the broader research community.

Core Innovations

The technological advancement proposed in this paper is the introduction of a conditional score function that operates at the patch level. This function incorporates the patch's location in the original image using coordinate channels, with the patch size being both randomized and diversified during training to effectively encode cross-region dependencies at multiple scales. By training on patches rather than full images, the framework reduces computational costs and time consumption significantly, to the extent that it achieves over twice the speed in training while maintaining, or even improving, generation quality. Notably, the models trained using Patch Diffusion demonstrate strong performance with smaller datasets, requiring as few as 5,000 images to achieve competitive results from scratch.

Methodology

Patch Diffusion employs conditional score matching on image patches whereby both the location and size of each patch serve as conditions for training the model. This is further supported by pixel-level coordinate systems to facilitate better patch-level score matching and sampling processes that are as streamlined as original diffusion models—full coordinates are concatenated with sampled noise to easily traverse the reverse diffusion chain.

During training, the patch sizes follow a schedule that can be stochastic—randomly sampling patch sizes—or progressive, where training progresses from small patches to large ones and finally includes the full-size images. This balanced approach between patch size diversity and occasional full-image training integrates global structure encoding effectively, fostering both training efficiency and generation quality.

Results

The paper reports compelling experimental results, showcasing significant improvements in training speed and generation quality across various datasets. For CelebA and FFHQ datasets, Patch Diffusion attains remarkable FID scores, which are close to—and sometimes surpass—state-of-the-art benchmarks, despite the generally reduced training times. Moreover, when integrated into established models like ControlNet, Patch Diffusion noticeably enhances fine-tuning efficiency.

Additionally, the framework has demonstrated superior performance on limited-sized datasets like AFHQv2, illustrating improved data efficiency—a vital attribute for advancing diffusion models to smaller data environments. The technique also shows potential for image extrapolation tasks, where trained models effectively extrapolate image boundaries and maintain coherence across expanded coordinate manifolds.

Implications and Future Work

The implementation of Patch Diffusion heralds promising implications for both practical and theoretical progress in artificial intelligence. Practically, it enables more researchers to leverage diffusion models without prohibitive resource expenses, creating pathways for broader innovation within Generative AI. Theoretically, it opens up new avenues for research into patch-wise training strategies and the convergence of score functions under data augmentation methodologies.

Future work could explore enhancements to the coordinate systems, such as refined positional embeddings for better integration, as well as theoretical explorations on the convergence of patch-wise score matching in general cases. Given its efficient reduction in resource use and improved generation quality, Patch Diffusion sets a compelling precedent for employing coordinate-conditioned score matching and stochastic scheduling in generative modeling frameworks.

PDF Markdown

GitHub

GitHub - Zhendong-Wang/Patch-Diffusion (63 stars)