Boosting Diffusion Models with Moving Average Sampling in Frequency Domain (2403.17870v1)
Abstract: Diffusion models have recently brought a powerful revolution in image generation. Despite showing impressive generative capabilities, most of these models rely on the current sample to denoise the next one, possibly resulting in denoising instability. In this paper, we reinterpret the iterative denoising process as model optimization and leverage a moving average mechanism to ensemble all the prior samples. Instead of simply applying moving average to the denoised samples at different timesteps, we first map the denoised samples to data space and then perform moving average to avoid distribution shift across timesteps. In view that diffusion models evolve the recovery from low-frequency components to high-frequency details, we further decompose the samples into different frequency components and execute moving average separately on each component. We name the complete approach "Moving Average Sampling in Frequency domain (MASF)". MASF could be seamlessly integrated into mainstream pre-trained diffusion models and sampling schedules. Extensive experiments on both unconditional and conditional diffusion models demonstrate that our MASF leads to superior performances compared to the baselines, with almost negligible additional complexity cost.
- Renderdiffusion: Image Diffusion for 3D Reconstruction, inpainting and generation. In CVPR, 2023.
- Analytic-DPM: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. In ICLR, 2022.
- All are Worth Words: A ViT Backbone for Diffusion Models. In CVPR, 2023.
- John Charles Butcher. A history of Runge-Kutta methods. Applied numerical mathematics, 20(3):247–260, 1996.
- Controlstyle: Text-driven stylized image generation using diffusion priors. In ACM Multimedia, 2023a.
- Control3d: Towards controllable text-to-3d generation. In ACM Multimedia, 2023b.
- ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
- Diffusion Models Beat GANs on Image Synthesis. In NeurIPS, 2021.
- Structure and Content-Guided Video Synthesis with Diffusion Models. In ICCV, 2023.
- Generative Diffusion Prior for Unified Image Restoration and Enhancement. In CVPR, 2023.
- SWAGAN: A style-based wavelet-driven generative model. ACM Transactions on Graphics (TOG), 40(4):1–11, 2021.
- SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion Models. In NeurIPS, 2023.
- Generative Adversarial Nets. In NeurIPS, 2014.
- Amara Graps. An Introduction to Wavelets. IEEE computational science and engineering, 1995.
- Wavelet Score-Based Generative Modeling. In NeurIPS, 2022.
- GANs trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
- Classifier-Free Diffusion Guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- Denoising Diffusion Probabilistic Models. In NeurIPS, 2020.
- Imagen Video: High Definition Video Generation with Diffusion Models. arXiv preprint arXiv:2210.02303, 2022.
- Gotta Go Fast When Generating Data with Score-Based Models. arXiv preprint arXiv:2105.14080, 2021.
- A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR, 2019.
- Diffwave: A Versatile Diffusion Model for Audio Synthesis. In ICLR, 2021.
- Localization of Diffusion-Based Inpainting in Digital Images. IEEE transactions on information forensics and security, 12(12):3050–3064, 2017.
- Wavelet Transform-Assisted Adaptive Generative Modeling for Colorization. IEEE Transactions on Multimedia, 25:4547–4562, 2023a.
- ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models. arXiv preprint arXiv:2301.12935, 2023b.
- Microsoft COCO: Common objects in context. In ECCV, 2014.
- Pseudo Numerical Methods for Diffusion Models on Manifolds. In ICLR, 2022.
- DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models. arXiv preprint arXiv:2211.01095, 2022a.
- DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In NeurIPS, 2022b.
- Repaint: Inpainting using denoising diffusion probabilistic models. In CVPR, 2022.
- Semantic-conditional diffusion networks for image captioning. In CVPR, 2023a.
- Refusion: Enabling large-size realistic image restoration with latent-space diffusion models. In CVPR, 2023b.
- S.G. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674–693, 1989.
- To create what you tell: Generating videos from captions. In ACM Multimedia, 2017.
- Wavelet Diffusion Models Are Fast and Scalable Image Generators. In CVPR, 2023.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022a.
- High-Resolution Image Synthesis With Latent Diffusion Models. In CVPR, 2022b.
- DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In NeurIPS, 2022.
- Flavio Schneider. Archisound: Audio Generation with Diffusion. arXiv preprint arXiv:2301.13267, 2023.
- Denoising Diffusion Implicit Models. In ICLR, 2021a.
- Score-Based Generative Modeling through Stochastic Differential Equations. In ICLR, 2021b.
- The Haar wavelet transform: its status and achievements. Computers & Electrical Engineering, 29(1):25–44, 2003.
- Boosting diffusion models with an adaptive momentum sampler. arXiv preprint arXiv:2308.11941, 2023.
- Daniel Raymond Wells. Multirate linear multistep methods for the solution of systems of ordinary differential equations. University of Illinois at Urbana-Champaign, 1982.
- Diffusion Sampling with Momentum for Mitigating Divergence Artifacts. In ICLR, 2023.
- Fast Diffusion Model. arXiv preprint arXiv:2306.06991, 2023.
- Diffir: Efficient diffusion model for image restoration. In ICLR, 2023.
- 3dstyle-diffusion: Pursuing fine-grained text-driven 3d stylization with 2d diffusion models. In ACM Multimedia, 2023a.
- WaveGAN: An Frequency-aware GAN for High-Fidelity Few-shot Image Generation. In ECCV, 2022a.
- FreGAN: Exploiting Frequency Components for Training GANs under Limited Data. In NeurIPS, 2022b.
- Diffusion Probabilistic Modeling for Video Generation. Entropy, 25(10):1469, 2023b.
- Diffusion Probabilistic Model Made Slim. In CVPR, 2023c.
- LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. arXiv preprint arXiv:1506.03365, 2015.
- Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models. arXiv preprint arXiv:2307.14648, 2023.
- StyleSwin: Transformer-Based GAN for High-Resolution Image Generation. In CVPR, 2022.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023a.
- Fast Sampling of Diffusion Models with Exponential Integrator. arXiv preprint arXiv:2204.13902, 2022.
- gDDIM: Generalized denoising diffusion implicit models. In ICLR, 2023b.
- UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models. In NeurIPS, 2023.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.