Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers (2403.12063v2)
Abstract: Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution $p_{\theta}(X_0|y)$, with a predefined diffusion model $p_{\theta}(X_0)$, an operator $f(\cdot)$, and a measurement $y=f(x'0)$ derived from an unknown image $x'_0$. Existing DIS estimate the conditional score function by evaluating $f(\cdot)$ with an approximated posterior sample drawn from $p{\theta}(X_0|X_t)$. However, most prior approximations rely on the posterior means, which may not lie in the support of the image distribution, thereby potentially diverge from the appearance of genuine images. Such out-of-support samples may significantly degrade the performance of the operator $f(\cdot)$, particularly when it is a neural network. In this paper, we introduces a novel approach for posterior approximation that guarantees to generate valid samples within the support of the image distribution, and also enhances the compatibility with neural network-based operators $f(\cdot)$. We first demonstrate that the solution of the Probability Flow Ordinary Differential Equation (PF-ODE) with an initial value $x_t$ yields an effective posterior sample $p_{\theta}(X_0|X_t=x_t)$. Based on this observation, we adopt the Consistency Model (CM), which is distilled from PF-ODE, for posterior sampling. Furthermore, we design a novel family of DIS using only CM. Through extensive experiments, we show that our proposed method for posterior sample approximation substantially enhance the effectiveness of DIS for neural network operators $f(\cdot)$ (e.g., in semantic segmentation). Additionally, our experiments demonstrate the effectiveness of the new CM-based inversion techniques. The source code is provided in the supplementary material.
- Universal guidance for diffusion models. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 843–852, 2023. URL https://api.semanticscholar.org/CorpusID:256846836.
- Demystifying mmd gans. ArXiv, abs/1801.01401, 2018. URL https://api.semanticscholar.org/CorpusID:3531856.
- Tweedie moment projected diffusions for inverse problems. arXiv preprint arXiv:2310.06721, 2023.
- Importance weighted autoencoders. CoRR, abs/1509.00519, 2015. URL https://api.semanticscholar.org/CorpusID:11383178.
- Diffusion posterior sampling for general noisy inverse problems. ArXiv, abs/2209.14687, 2022a. URL https://api.semanticscholar.org/CorpusID:252596252.
- Improving diffusion models for inverse problems using manifold constraints. ArXiv, abs/2206.00941, 2022b. URL https://api.semanticscholar.org/CorpusID:249282628.
- Prompt-tuning latent diffusion models for inverse problems. ArXiv, abs/2310.01110, 2023. URL https://api.semanticscholar.org/CorpusID:263605744.
- Inverting the generator of a generative adversarial network. IEEE Transactions on Neural Networks and Learning Systems, 30:1967–1974, 2016. URL https://api.semanticscholar.org/CorpusID:3621348.
- Intermediate layer optimization for inverse problems using deep generative models. In International Conference on Machine Learning, 2021. URL https://api.semanticscholar.org/CorpusID:231925054.
- Diffusion models beat gans on image synthesis. ArXiv, abs/2105.05233, 2021. URL https://api.semanticscholar.org/CorpusID:234357997.
- Efron, B. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106:1602 – 1614, 2011. URL https://api.semanticscholar.org/CorpusID:23284154.
- Score-based diffusion models as principled priors for inverse imaging. arXiv preprint arXiv:2304.11751, 2023.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Diffusion models as plug-and-play priors. ArXiv, abs/2206.09012, 2022. URL https://api.semanticscholar.org/CorpusID:249889060.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015. URL https://api.semanticscholar.org/CorpusID:206594692.
- Manifold preserving guided diffusion. ArXiv, abs/2311.16424, 2023. URL https://api.semanticscholar.org/CorpusID:265466093.
- Manifold preserving guided diffusion. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=o3BxOLoxm1.
- Clipscore: A reference-free evaluation metric for image captioning. ArXiv, abs/2104.08718, 2021. URL https://api.semanticscholar.org/CorpusID:233296711.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Neural Information Processing Systems, 2017. URL https://api.semanticscholar.org/CorpusID:326772.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Elucidating the design space of diffusion-based generative models. ArXiv, abs/2206.00364, 2022. URL https://api.semanticscholar.org/CorpusID:249240415.
- Denoising diffusion restoration models. ArXiv, abs/2201.11793, 2022. URL https://api.semanticscholar.org/CorpusID:246411364.
- Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013. URL https://api.semanticscholar.org/CorpusID:216078090.
- Variational diffusion models. ArXiv, abs/2107.00630, 2021. URL https://api.semanticscholar.org/CorpusID:235694314.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML, 2022.
- Indoor scene layout estimation from a single image. 2018 24th International Conference on Pattern Recognition (ICPR), pp. 842–847, 2018. URL https://api.semanticscholar.org/CorpusID:54212984.
- Repaint: Inpainting using denoising diffusion probabilistic models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11451–11461, 2022. URL https://api.semanticscholar.org/CorpusID:246240274.
- Latent consistency models: Synthesizing high-resolution images with few-step inference. ArXiv, abs/2310.04378, 2023. URL https://api.semanticscholar.org/CorpusID:263831037.
- Pulse: Self-supervised photo upsampling via latent space exploration of generative models. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2434–2442, 2020. URL https://api.semanticscholar.org/CorpusID:212634162.
- Reasons for the superiority of stochastic estimators over deterministic ones: Robustness, consistency and perceptual quality. In International Conference on Machine Learning, pp. 26474–26494. PMLR, 2023.
- Beyond first-order tweedie: Solving inverse problems using latent diffusion. ArXiv, abs/2312.00852, 2023a. URL https://api.semanticscholar.org/CorpusID:265609906.
- Solving linear inverse problems provably via posterior sampling with latent diffusion models. ArXiv, abs/2307.00619, 2023b. URL https://api.semanticscholar.org/CorpusID:259316242.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2022.
- Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2023a. URL https://api.semanticscholar.org/CorpusID:259298715.
- Loss-guided diffusion models for plug-and-play controllable generation. In International Conference on Machine Learning, 2023b. URL https://api.semanticscholar.org/CorpusID:260957043.
- Generative modeling by estimating gradients of the data distribution. In Neural Information Processing Systems, 2019. URL https://api.semanticscholar.org/CorpusID:196470871.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Consistency models. In International Conference on Machine Learning, 2023c. URL https://api.semanticscholar.org/CorpusID:257280191.
- Stochastic gradient descent as approximate bayesian inference. Journal of Machine Learning Research, 18(134):1–35, 2017.
- Zero-shot image restoration using denoising diffusion null-space model. ArXiv, abs/2212.00490, 2022. URL https://api.semanticscholar.org/CorpusID:254125609.
- Practical and asymptotically exact conditional sampling in diffusion models. ArXiv, abs/2306.17775, 2023. URL https://api.semanticscholar.org/CorpusID:259309049.
- Idempotence and perceptual image compression. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Cy5v64DqEF.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- Freedom: Training-free energy-guided conditional diffusion model. ArXiv, abs/2303.09833, 2023. URL https://api.semanticscholar.org/CorpusID:257622962.
- The unreasonable effectiveness of deep features as a perceptual metric. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586–595, 2018. URL https://api.semanticscholar.org/CorpusID:4766599.
- Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- Çinlar, E. Probability and stochastics. 2011. URL https://api.semanticscholar.org/CorpusID:117914785.