Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers (2403.12063v2)

Published 9 Feb 2024 in cs.CV and cs.LG

Abstract: Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution $p_{\theta}(X_0|y)$, with a predefined diffusion model $p_{\theta}(X_0)$, an operator $f(\cdot)$, and a measurement $y=f(x'0)$ derived from an unknown image $x'_0$. Existing DIS estimate the conditional score function by evaluating $f(\cdot)$ with an approximated posterior sample drawn from $p{\theta}(X_0|X_t)$. However, most prior approximations rely on the posterior means, which may not lie in the support of the image distribution, thereby potentially diverge from the appearance of genuine images. Such out-of-support samples may significantly degrade the performance of the operator $f(\cdot)$, particularly when it is a neural network. In this paper, we introduces a novel approach for posterior approximation that guarantees to generate valid samples within the support of the image distribution, and also enhances the compatibility with neural network-based operators $f(\cdot)$. We first demonstrate that the solution of the Probability Flow Ordinary Differential Equation (PF-ODE) with an initial value $x_t$ yields an effective posterior sample $p_{\theta}(X_0|X_t=x_t)$. Based on this observation, we adopt the Consistency Model (CM), which is distilled from PF-ODE, for posterior sampling. Furthermore, we design a novel family of DIS using only CM. Through extensive experiments, we show that our proposed method for posterior sample approximation substantially enhance the effectiveness of DIS for neural network operators $f(\cdot)$ (e.g., in semantic segmentation). Additionally, our experiments demonstrate the effectiveness of the new CM-based inversion techniques. The source code is provided in the supplementary material.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Universal guidance for diffusion models. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.  843–852, 2023. URL https://api.semanticscholar.org/CorpusID:256846836.
  2. Demystifying mmd gans. ArXiv, abs/1801.01401, 2018. URL https://api.semanticscholar.org/CorpusID:3531856.
  3. Tweedie moment projected diffusions for inverse problems. arXiv preprint arXiv:2310.06721, 2023.
  4. Importance weighted autoencoders. CoRR, abs/1509.00519, 2015. URL https://api.semanticscholar.org/CorpusID:11383178.
  5. Diffusion posterior sampling for general noisy inverse problems. ArXiv, abs/2209.14687, 2022a. URL https://api.semanticscholar.org/CorpusID:252596252.
  6. Improving diffusion models for inverse problems using manifold constraints. ArXiv, abs/2206.00941, 2022b. URL https://api.semanticscholar.org/CorpusID:249282628.
  7. Prompt-tuning latent diffusion models for inverse problems. ArXiv, abs/2310.01110, 2023. URL https://api.semanticscholar.org/CorpusID:263605744.
  8. Inverting the generator of a generative adversarial network. IEEE Transactions on Neural Networks and Learning Systems, 30:1967–1974, 2016. URL https://api.semanticscholar.org/CorpusID:3621348.
  9. Intermediate layer optimization for inverse problems using deep generative models. In International Conference on Machine Learning, 2021. URL https://api.semanticscholar.org/CorpusID:231925054.
  10. Diffusion models beat gans on image synthesis. ArXiv, abs/2105.05233, 2021. URL https://api.semanticscholar.org/CorpusID:234357997.
  11. Efron, B. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106:1602 – 1614, 2011. URL https://api.semanticscholar.org/CorpusID:23284154.
  12. Score-based diffusion models as principled priors for inverse imaging. arXiv preprint arXiv:2304.11751, 2023.
  13. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  14. Diffusion models as plug-and-play priors. ArXiv, abs/2206.09012, 2022. URL https://api.semanticscholar.org/CorpusID:249889060.
  15. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, 2015. URL https://api.semanticscholar.org/CorpusID:206594692.
  16. Manifold preserving guided diffusion. ArXiv, abs/2311.16424, 2023. URL https://api.semanticscholar.org/CorpusID:265466093.
  17. Manifold preserving guided diffusion. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=o3BxOLoxm1.
  18. Clipscore: A reference-free evaluation metric for image captioning. ArXiv, abs/2104.08718, 2021. URL https://api.semanticscholar.org/CorpusID:233296711.
  19. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Neural Information Processing Systems, 2017. URL https://api.semanticscholar.org/CorpusID:326772.
  20. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  21. Elucidating the design space of diffusion-based generative models. ArXiv, abs/2206.00364, 2022. URL https://api.semanticscholar.org/CorpusID:249240415.
  22. Denoising diffusion restoration models. ArXiv, abs/2201.11793, 2022. URL https://api.semanticscholar.org/CorpusID:246411364.
  23. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013. URL https://api.semanticscholar.org/CorpusID:216078090.
  24. Variational diffusion models. ArXiv, abs/2107.00630, 2021. URL https://api.semanticscholar.org/CorpusID:235694314.
  25. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML, 2022.
  26. Indoor scene layout estimation from a single image. 2018 24th International Conference on Pattern Recognition (ICPR), pp.  842–847, 2018. URL https://api.semanticscholar.org/CorpusID:54212984.
  27. Repaint: Inpainting using denoising diffusion probabilistic models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  11451–11461, 2022. URL https://api.semanticscholar.org/CorpusID:246240274.
  28. Latent consistency models: Synthesizing high-resolution images with few-step inference. ArXiv, abs/2310.04378, 2023. URL https://api.semanticscholar.org/CorpusID:263831037.
  29. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  2434–2442, 2020. URL https://api.semanticscholar.org/CorpusID:212634162.
  30. Reasons for the superiority of stochastic estimators over deterministic ones: Robustness, consistency and perceptual quality. In International Conference on Machine Learning, pp.  26474–26494. PMLR, 2023.
  31. Beyond first-order tweedie: Solving inverse problems using latent diffusion. ArXiv, abs/2312.00852, 2023a. URL https://api.semanticscholar.org/CorpusID:265609906.
  32. Solving linear inverse problems provably via posterior sampling with latent diffusion models. ArXiv, abs/2307.00619, 2023b. URL https://api.semanticscholar.org/CorpusID:259316242.
  33. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.  2256–2265. PMLR, 2015.
  34. Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2022.
  35. Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2023a. URL https://api.semanticscholar.org/CorpusID:259298715.
  36. Loss-guided diffusion models for plug-and-play controllable generation. In International Conference on Machine Learning, 2023b. URL https://api.semanticscholar.org/CorpusID:260957043.
  37. Generative modeling by estimating gradients of the data distribution. In Neural Information Processing Systems, 2019. URL https://api.semanticscholar.org/CorpusID:196470871.
  38. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  39. Consistency models. In International Conference on Machine Learning, 2023c. URL https://api.semanticscholar.org/CorpusID:257280191.
  40. Stochastic gradient descent as approximate bayesian inference. Journal of Machine Learning Research, 18(134):1–35, 2017.
  41. Zero-shot image restoration using denoising diffusion null-space model. ArXiv, abs/2212.00490, 2022. URL https://api.semanticscholar.org/CorpusID:254125609.
  42. Practical and asymptotically exact conditional sampling in diffusion models. ArXiv, abs/2306.17775, 2023. URL https://api.semanticscholar.org/CorpusID:259309049.
  43. Idempotence and perceptual image compression. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Cy5v64DqEF.
  44. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
  45. Freedom: Training-free energy-guided conditional diffusion model. ArXiv, abs/2303.09833, 2023. URL https://api.semanticscholar.org/CorpusID:257622962.
  46. The unreasonable effectiveness of deep features as a perceptual metric. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  586–595, 2018. URL https://api.semanticscholar.org/CorpusID:4766599.
  47. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
  48. Çinlar, E. Probability and stochastics. 2011. URL https://api.semanticscholar.org/CorpusID:117914785.
Citations (2)

Summary

We haven't generated a summary for this paper yet.