Compensation Sampling for Improved Convergence in Diffusion Models (2312.06285v1)
Abstract: Diffusion models achieve remarkable quality in image generation, but at a cost. Iterative denoising requires many time steps to produce high fidelity images. We argue that the denoising process is crucially limited by an accumulation of the reconstruction error due to an initial inaccurate reconstruction of the target data. This leads to lower quality outputs, and slower convergence. To address this issue, we propose compensation sampling to guide the generation towards the target domain. We introduce a compensation term, implemented as a U-Net, which adds negligible computation overhead during training and, optionally, inference. Our approach is flexible and we demonstrate its application in unconditional generation, face inpainting, and face de-occlusion using benchmark datasets CIFAR-10, CelebA, CelebA-HQ, FFHQ-256, and FSG. Our approach consistently yields state-of-the-art results in terms of image quality, while accelerating the denoising process to converge during training by up to an order of magnitude.
- Compositional transformers for scene generation. Advances in Neural Information Processing Systems, 34:9506–9520, 2021.
- Cold Diffusion: Inverting arbitrary image transforms without noise. arXiv preprint arXiv:2208.09392, 2022.
- Estimating the optimal covariance with imperfect mean in diffusion probabilistic models. arXiv preprint arXiv:2206.07309, 2022a.
- Analytic-DPM: An analytic estimate of the optimal reverse variance in diffusion probabilistic models. In International Conference on Learning Representations, 2022b.
- Conditional image generation with score-based diffusion models. arXiv preprint arXiv:2111.13606, 2021.
- Facial structure guided GAN for identity-preserved face image de-occlusion. In International Conference on Multimedia Retrieval, pages 46–54, 2021.
- Perception prioritized training of diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11472–11481, 2022.
- Soft Diffusion: Score matching with general corruptions. Transactions on Machine Learning Research (TMLR), 2023.
- Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Occlusion-aware GAN for face de-occlusion in the wild. In IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, 2020.
- Taming transformers for high-resolution image synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12873–12883, 2021.
- Generative diffusion prior for unified image restoration and enhancement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9935–9946, 2023.
- Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014.
- GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Keys to better image inpainting: Structure and texture go hand in hand. In IEEE/CVF Winter Conference on Applications of Computer Vision, pages 208–217, 2023.
- Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations, 2018.
- A style-based generator architecture for generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019.
- Analyzing and improving the image quality of StyleGAN. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8110–8119, 2020.
- Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364, 2022.
- Soft Truncation: A universal training technique of score-based diffusion model for high precision score estimation. arXiv preprint arXiv:2106.05527, 2021.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019.
- Entropy-driven sampling and training scheme for conditional diffusion generation. arXiv preprint arXiv:2206.11474, 2022a.
- MAT: Mask-aware transformer for large hole image inpainting. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10758–10768, 2022b.
- Q-diffusion: Quantizing diffusion models. In IEEE/CVF International Conference on Computer Vision, pages 17535–17545, 2023.
- Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778, 2022a.
- Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022b.
- Deep learning face attributes in the wild. In IEEE International Conference on Computer Vision, pages 3730–3738, 2015.
- DPM-Solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022a.
- GLaMa: Joint spatial and frequency loss for general image inpainting. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1301–1310, 2022b.
- Repaint: Inpainting using denoising diffusion probabilistic models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
- Understanding deep image representations by inverting them. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5188–5196, 2015.
- On distillation of guided diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14297–14306, 2023.
- A morphology focused diffusion probabilistic model for synthesis of histopathology images. In IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2000–2009, 2023.
- Generating images with sparse representations. arXiv preprint arXiv:2103.03841, 2021.
- DiffuseVAE: Efficient, controllable and high-fidelity generation from low-dimensional latents. arXiv preprint arXiv:2201.00308, 2022.
- Diffusion autoencoders: Toward a meaningful and decodable representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10619–10629, 2022.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241, 2015.
- A U-net based discriminator for generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8207–8216, 2020.
- D2C: Diffusion-decoding models for few-shot conditional generation. Advances in Neural Information Processing Systems, 34:12533–12548, 2021.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265, 2015.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
- Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data. In International Conference on Machine Learning, pages 9206–9216, 2020.
- Resolution-robust large mask inpainting with Fourier convolutions. In IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2149–2159, 2022.
- NVAE: A deep hierarchical variational autoencoder. Advances in Neural Information Processing Systems, 33:19667–19679, 2020.
- Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
- Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
- Patch diffusion: Faster and more data-efficient training of diffusion models. arXiv preprint arXiv:2304.12526, 2023a.
- Diffusion-GAN: Training GANs with diffusion. In International Conference on Learning Representations, 2023b.
- Tackling the generative learning trilemma with denoising diffusion GANs. arXiv preprint arXiv:2112.07804, 2021.
- PFGM++: Unlocking the potential of physics-inspired generative models. arXiv preprint arXiv:2302.04265, 2023a.
- Stable target field for reduced variance score estimation in diffusion models. In International Conference on Learning Representations, 2023b.
- Segmentation-reconstruction-guided facial image de-occlusion. In IEEE International Conference on Automatic Face and Gesture Recognition (FG), pages 1–8, 2023.
- Generative image inpainting with contextual attention. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5505–5514, 2018.
- Face de-occlusion with deep cascade guidance learning. IEEE Transactions on Multimedia, 2022.
- The unreasonable effectiveness of deep features as a perceptual metric. In IEEE Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018.
- Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428, 2021.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision, pages 2223–2232, 2017.