AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration (2309.10438v2)
Abstract: Diffusion models are emerging expressive generative models, in which a large number of time steps (inference steps) are required for a single image generation. To accelerate such tedious process, reducing steps uniformly is considered as an undisputed principle of diffusion models. We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optimal time steps for different models. Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training. Specifically, we first design a unified search space that consists of all possible time steps and various architectures. Then, a two stage evolutionary algorithm is introduced to find the optimal solution in the designed search space. To further accelerate the search process, we employ FID score between generated and real samples to estimate the performance of the sampled examples. As a result, the proposed method is (i).training-free, obtaining the optimal time steps and model architecture without any training process; (ii). orthogonal to most advanced diffusion samplers and can be integrated to gain better sample quality. (iii). generalized, where the searched time steps and architectures can be directly applied on different diffusion models with the same guidance scale. Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $\times$ 64 with only four steps, compared to 138.66 with DDIM. The code is available at https://github.com/lilijiangg/AutoDiffusion.
- Isnas-dip: Image-specific neural architecture search for deep image prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1960–1968, 2022.
- Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022.
- Tract: Denoising diffusion models with transitive closure time-distillation. arXiv preprint arXiv:2303.04248, 2023.
- Ilvr: Conditioning method for denoising diffusion probabilistic models. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 14347–14356, 2021.
- Perception prioritized training of diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11472–11481, 2022.
- Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12413–12422, 2022.
- On analyzing generative and denoising capabilities of diffusion-based deep generative models. CoRR, abs/2206.00070, 2022.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Neural architecture search: A survey. The Journal of Machine Learning Research, 20(1):1997–2017, 2019.
- Vector quantized diffusion model for text-to-image synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, pages 10686–10696, 2022.
- Inas: integral nas for device-aware salient object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4934–4944, 2021.
- Single path one-shot neural architecture search with uniform sampling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, pages 544–560. Springer, 2020.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res., 23:47:1–47:33, 2022.
- Analyzing and improving the image quality of stylegan. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8107–8116. Computer Vision Foundation / IEEE, 2020.
- Maurice G Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
- Bossnas: Exploring hybrid cnn-transformers with block-wisely self-supervised neural architecture search. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 12261–12271, 2021.
- Searching lightweight neural network for image signal processing. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2825–2833, 2022.
- Pseudo numerical methods for diffusion models on manifolds. In The Tenth International Conference on Learning Representations, ICLR, 2022.
- DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
- Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
- GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, ICML, volume 162, pages 16784–16804, 2022.
- Pi-nas: Improving neural architecture search by reducing supernet training consistency shift. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 12334–12344, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
- Palette: Image-to-image diffusion models. In SIGGRAPH ’22: Special Interest Group on Computer Graphics and Interactive Techniques Conference, pages 15:1–15:10. ACM, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, 2022.
- Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022.
- Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
- Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34:1415–1428, 2021.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
- Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490, 2022.
- Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations, 2022.
- Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6649–6658, 2019.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- Rethinking performance estimation in neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11356–11365, 2020.
- Lijiang Li (9 papers)
- Huixia Li (16 papers)
- Xiawu Zheng (63 papers)
- Jie Wu (230 papers)
- Xuefeng Xiao (51 papers)
- Rui Wang (996 papers)
- Min Zheng (32 papers)
- Xin Pan (24 papers)
- Fei Chao (53 papers)
- Rongrong Ji (315 papers)