Fast constrained sampling in pre-trained diffusion models (2410.18804v2)
Abstract: Large denoising diffusion models, such as Stable Diffusion, have been trained on billions of image-caption pairs to perform text-conditioned image generation. As a byproduct of this training, these models have acquired general knowledge about image statistics, which can be useful for other inference tasks. However, when confronted with sampling an image under new constraints, e.g. generating the missing parts of an image, using large pre-trained text-to-image diffusion models is inefficient and often unreliable. Previous approaches either utilize backpropagation, making them significantly slower and more memory-demanding than text-to-image inference, or only enforce the constraint locally, failing to capture critical long-range correlations. In this work, we propose an algorithm that enables fast and high-quality generation under arbitrary constraints. We observe that, during inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image. This allows us to employ a numerical approximation to expensive gradient computations, incurring significant speed-ups in inference. Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches while requiring a fraction of the time. We demonstrate the effectiveness of our algorithm under both linear and non-linear constraints. An implementation is provided at https://github.com/cvlab-stonybrook/fast-constrained-sampling.
- Diffusion posterior sampling for general noisy inverse problems. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=OnD9zGAGT0k.
- Prompt-tuning latent diffusion models for inverse problems. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp. 8941–8967. PMLR, 2024. URL https://proceedings.mlr.press/v235/chung24b.html.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Bradley Efron. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496):1602–1614, 2011.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- Solving linear inverse problems provably via posterior sampling with latent diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=XKBFdYwfRo.
- Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings, pp. 1–10, 2022.
- Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2020.
- Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision, pp. 1–21, 2024.
- Smartbrush: Text and shape guided object inpainting with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22428–22437, 2023.