Image Inpainting via Tractable Steering of Diffusion Models (2401.03349v2)
Abstract: Diffusion models are the current state of the art for generating photorealistic images. Controlling the sampling process for constrained image generation tasks such as inpainting, however, remains challenging since exact conditioning on such constraints is intractable. While existing methods use various techniques to approximate the constrained posterior, this paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior, and to leverage this signal to steer the denoising process of diffusion models. Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs). Building upon prior advances, we further scale up PCs and make them capable of guiding the image generation process of diffusion models. Empirical results suggest that our approach can consistently improve the overall quality and semantic coherence of inpainted images across three natural image datasets (i.e., CelebA-HQ, ImageNet, and LSUN) with only $\sim! 10 \%$ additional computational overhead brought by the TPM. Further, with the help of an image encoder and decoder, our method can readily accept semantic constraints on specific regions of the image, which opens up the potential for more controlled image generation tasks. In addition to proposing a new framework for constrained image generation, this paper highlights the benefit of more tractable models and motivates the development of expressive TPMs.
- Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18208–18218, 2022.
- Tractable learning for complex probability queries. Advances in Neural Information Processing Systems, 28, 2015.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Probabilistic circuits: A unifying framework for tractable probabilistic models. oct 2020. URL http://starai.cs.ucla.edu/papers/ProbCirc20.pdf.
- Group fairness by probabilistic modeling with latent fair decisions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 12051–12059, 2021.
- Approximating discrete probability distributions with dependence trees. IEEE transactions on Information Theory, 14(3):462–467, 1968.
- Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687, 2022.
- Continuous mixtures of tractable probabilistic models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 7244–7252, 2023.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12873–12883, 2021.
- Boosted generative models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Progressive image inpainting with full-resolution residual network. In Proceedings of the 27th acm international conference on multimedia, pp. 2496–2504, 2019.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):1–14, 2017.
- Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35:23593–23606, 2022.
- Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
- Probabilistic sentential decision diagrams. In Fourteenth International Conference on the Principles of Knowledge Representation and Reasoning, 2014.
- Learning multiple layers of features from tiny images. 2009.
- Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3):123–286, 2012.
- Lossless compression with probabilistic circuits. In Proceedings of the International Conference on Learning Representations (ICLR), 2022a.
- Scaling up probabilistic circuits by latent variable distillation. In The Eleventh International Conference on Learning Representations, 2022b.
- Understanding the distillation process from deep generative models to tractable probabilistic circuits. In International Conference on Machine Learning, pp. 21825–21838. PMLR, 2023.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- How to turn your knowledge graph embeddings into generative models via probabilistic circuits. arXiv preprint arXiv:2305.15944, 2023.
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11461–11471, 2022.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- On the latent variable interpretation in sum-product networks. IEEE transactions on pattern analysis and machine intelligence, 39(10):2030–2044, 2016.
- Einsum networks: Fast and scalable learning of tractable probabilistic circuits. In International Conference on Machine Learning, pp. 7563–7574. PMLR, 2020.
- Generating diverse structure for image inpainting with hierarchical vq-vae. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10775–10784, 2021.
- Sum-product networks: a new deep architecture. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp. 337–346, 2011.
- An introduction to hidden markov models. ieee assp magazine, 3(1):4–16, 1986.
- Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of chow-liu trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part II 14, pp. 630–645. Springer, 2014.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- Tractable operations for arithmetic circuits of probabilistic models. Advances in Neural Information Processing Systems, 29, 2016.
- Probabilistic flow circuits: towards unified deep models for tractable probabilistic inference. In Uncertainty in Artificial Intelligence, pp. 1964–1973. PMLR, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 2149–2159, 2022.
- Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. arXiv preprint arXiv:2206.04119, 2022.
- Nvae: A deep hierarchical variational autoencoder. Advances in neural information processing systems, 33:19667–19679, 2020.
- A compositional atlas of tractable circuit operations for probabilistic inference. Advances in Neural Information Processing Systems, 34:13189–13201, 2021.
- High-fidelity pluralistic image completion with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4692–4701, 2021.
- Zero-shot image restoration using denoising diffusion null-space model. In The Eleventh International Conference on Learning Representations, 2022.
- Deep learning for image inpainting: A survey. Pattern Recognition, 134:109046, 2023.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- Diverse image inpainting with bidirectional and autoregressive transformers. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 69–78, 2021.
- Towards coherent image inpainting using denoising diffusion implicit models. arXiv preprint arXiv:2304.03322, 2023a.
- Tractable control for autoregressive language generation. In International Conference on Machine Learning, pp. 40932–40945. PMLR, 2023b.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018.
- Large scale image completion via co-modulated generative adversarial networks. In International Conference on Learning Representations, 2020.
- Pluralistic image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1438–1447, 2019.