Bidirectional Consistency Models (2403.18035v3)
Abstract: Diffusion models (DMs) are capable of generating remarkably high-quality samples by iteratively denoising a random vector, a process that corresponds to moving along the probability flow ordinary differential equation (PF ODE). Interestingly, DMs can also invert an input image to noise by moving backward along the PF ODE, a key operation for downstream tasks such as interpolation and image editing. However, the iterative nature of this process restricts its speed, hindering its broader application. Recently, Consistency Models (CMs) have emerged to address this challenge by approximating the integral of the PF ODE, largely reducing the number of iterations. Yet, the absence of an explicit ODE solver complicates the inversion process. To resolve this, we introduce Bidirectional Consistency Model (BCM), which learns a single neural network that enables both forward and backward traversal along the PF ODE, efficiently unifying generation and inversion tasks within one framework. We can train BCM from scratch or tune it using a pretrained consistency model, wh ich reduces the training cost and increases scalability. We demonstrate that BCM enables one-step generation and inversion while also allowing the use of additional steps to enhance generation quality or reduce reconstruction error. We further showcase BCM's capability in downstream tasks, such as interpolation, inpainting, and blind restoration of compressed images. Notably, when the number of function evaluations (NFE) is constrained, BCM surpasses domain-specific restoration methods, such as I$2$SB and Palette, in a fully zero-shot manner, offering an efficient alternative for inversion problems. Our code and weights are available at https://github.com/Mosasaur5526/BCM-iCT-torch.
- Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b.
- Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, 2020.
- OpenAI. Video generation models as world simulators. https://openai.com/research/video-generation-models-as-world-simulators, 2024. Accessed: 2024-02-26.
- Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6038–6047, 2023.
- An edit friendly ddpm noise space: Inversion and manipulations, 2023.
- Prompt-to-prompt image editing with cross attention control, 2022.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
- Diffusion models beat gans on image synthesis. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 8780–8794. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf.
- Consistency models. In Proceedings of the 40th International Conference on Machine Learning. JMLR.org, 2023.
- Improved techniques for training consistency models. In International Conference on Learning Representations, 2024.
- Consistency trajectory models: Learning probability flow ode trajectory of diffusion, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
- Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982. ISSN 0304-4149. doi: https://doi.org/10.1016/0304-4149(82)90051-5. URL https://www.sciencedirect.com/science/article/pii/0304414982900515.
- Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
- Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- Convergence guarantee for consistency models. arXiv preprint arXiv:2308.11449, 2023.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022a.
- Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022.
- Learning multiple layers of features from tiny images. 2009.
- The role of imagenet classes in fréchet inception distance. In The Eleventh International Conference on Learning Representations, 2023.
- Recombiner: Robust and enhanced compression with bayesian implicit neural representations, 2024.
- Implicit neural representations with periodic activation functions, 2020.
- Idempotence and perceptual image compression, 2024.
- Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
- Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
- Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.
- Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016.
- Wide residual networks. In Procedings of the British Machine Vision Conference 2016. British Machine Vision Association, 2016.
- Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations, 2022.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps, 2022b.
- Knowledge distillation in iterative generative models for improved sampling speed, 2021.
- Fast sampling of diffusion models via operator learning, 2023.
- Edict: Exact diffusion inversion via coupled transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22532–22541, 2023.
- Encoding in style: a stylegan encoder for image-to-image translation, 2021.
- Designing an encoder for stylegan image manipulation, 2021.
- Image2stylegan: How to embed images into the stylegan latent space?, 2019.
- Pivotal tuning for latent-based editing of real images, 2021.
- Hyperstyle: Stylegan inversion with hypernetworks for real image editing, 2022.
- Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023.
- On the variance of the adaptive learning rate and beyond. In International Conference on Learning Representations, 2020.
- Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019.
- Balance, imbalance, and rebalance: Understanding robust overfitting from a minimax game perspective. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019.
- Pascal Vincent. A connection between score matching and denoising autoencoders. Neural Comput, 23(7):1661–1674, April 2011.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Liangchen Li (13 papers)
- Jiajun He (28 papers)