DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing (2312.07409v1)
Abstract: Diffusion models have achieved remarkable image generation quality surpassing previous generative models. However, a notable limitation of diffusion models, in comparison to GANs, is their difficulty in smoothly interpolating between two image samples, due to their highly unstructured latent space. Such a smooth interpolation is intriguing as it naturally serves as a solution for the image morphing task with many applications. In this work, we present DiffMorpher, the first approach enabling smooth and natural image interpolation using diffusion models. Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition, where correspondence automatically emerges without the need for annotation. In addition, we propose an attention interpolation and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images. Extensive experiments demonstrate that DiffMorpher achieves starkly better image morphing effects than previous methods across a variety of object categories, bridging a critical functional gap that distinguished diffusion models from GANs.
- Alyaa Aloraibi. Image morphing techniques: A review. Technium: Romanian Journal of Applied Sciences and Technology, 9:41–53, 2023.
- Smooth image sequences for data-driven morphing. In Proceedings of the 37th Annual Conference of the European Association for Computer Graphics, page 203–213, 2016.
- One transformer fits all distributions in multi-modal diffusion at scale. In International Conference on Machine Learning, 2023.
- Feature-based image metamorphosis. In Proceedings of the 19th Annual Conference on Computer Graphics and Interactive Techniques, number 8, page 35–42, 1992.
- Bhumika G. Bhatt. Comparative study of triangulation based and feature based image morphing. Signal & Image Processing : An International Journal, 2:235–243, 2011.
- Large scale GAN training for high fidelity natural image synthesis. In ICLR, 2019.
- Instructpix2pix: Learning to follow image editing instructions. In CVPR, 2023.
- Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. ArXiv, abs/2304.08465, 2023.
- Image melding: Combining inconsistent images using patch-based synthesis. ACM TOG, 31(4), 2012.
- Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255, 2009.
- Diffusion models beat gans on image synthesis. In NeurIPS, volume 34, pages 8780–8794, 2021.
- Image morphing with perceptual constraints and stn alignment. Computer Graphics Forum, 39, 2020.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Prompt-to-prompt image editing with cross attention control. 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
- Jonathan Ho. Classifier-free diffusion guidance. ArXiv, abs/2207.12598, 2022.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res., 23:47:1–47:33, 2021.
- Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685, 2021.
- Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, pages 1510–1519, 2017.
- Alias-free generative adversarial networks. In Proc. NeurIPS, 2021.
- A style-based generator architecture for generative adversarial networks. In CVPR, pages 4396–4405, 2018.
- Analyzing and improving the image quality of stylegan. pages 8107–8116, 2019.
- Analyzing and improving the image quality of stylegan. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8107–8116, 2019.
- Imagic: Text-based real image editing with diffusion models. In CVPR, pages 6007–6017, 2022.
- Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.
- Automating image morphing using structural similarity on a halfway domain. ACM TOG, 33(5), 2014.
- Fixing weight decay regularization in adam. ArXiv, abs/1711.05101, 2017.
- Null-text inversion for editing real images using guided diffusion models. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6038–6047, 2022.
- Dragondiffusion: Enabling drag-style manipulation on diffusion models. ArXiv, abs/2307.02421, 2023.
- Exploiting deep generative prior for versatile image restoration and manipulation. In ECCV, 2020.
- Zero-shot image-to-image translation. ACM SIGGRAPH 2023 Conference Proceedings, 2023.
- Diffusion autoencoders: Toward a meaningful and decodable representation. In CVPR, 2022.
- Pivotal tuning for latent-based editing of real images. ACM TOG, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10674–10685, 2021.
- U-net: Convolutional networks for biomedical image segmentation. ArXiv, abs/1505.04597, 2015.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
- StyleGAN-T: Unlocking the power of GANs for fast large-scale text-to-image synthesis. In International Conference on Machine Learning, 2023.
- Stylegan-xl: Scaling stylegan to large diverse datasets. 2022.
- Laion-5b: An open large-scale dataset for training next generation image-text models. ArXiv, abs/2210.08402, 2022.
- Regenerative morphing. In CVPR, pages 615–622, 2010.
- Dragdiffusion: Harnessing diffusion models for interactive point-based image editing. ArXiv, abs/2306.14435, 2023.
- Ken Shoemake. Animating rotation with quaternion curves. SIGGRAPH Comput. Graph., 19(3):245–254, jul 1985.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, page 2256–2265, 2015.
- Denoising diffusion implicit models. ArXiv, abs/2010.02502, 2020.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
- Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, volume 139, pages 10347–10357, July 2021.
- Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, pages 1921–1930, June 2023.
- Attention is all you need. In NeurIPS, 2017.
- Interpolating between images with diffusion models, 2023.
- Deep network interpolation for continuous imagery effect transition. In CVPR, pages 1692–1701, 2018.
- George Wolberg. Image morphing: a survey. The Visual Computer, 14:360–372, 1998.
- Gan inversion: A survey. IEEE TPAMI, 45:3121–3138, 2021.
- Impus: Image morphing with perceptually-uniform sampling using diffusion models. ArXiv, abs/2311.06792, 2023.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Sine: Single image editing with text-to-image diffusion models. In CVPR, pages 6027–6037, 2023.
- A survey of morphing techniques. International Journal of Advanced engineering, Management and Science, 3:81–87, 2017.
- Kaiwen Zhang (23 papers)
- Yifan Zhou (158 papers)
- Xudong Xu (20 papers)
- Xingang Pan (45 papers)
- Bo Dai (245 papers)