Papers
Topics
Authors
Recent
2000 character limit reached

Diffusion Model Compression for Image-to-Image Translation (2401.17547v2)

Published 31 Jan 2024 in cs.CV

Abstract: As recent advances in large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged. Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process. In this paper, we propose a novel compression method tailored for diffusion-based I2I models. Based on the observations that the image conditions of I2I models already provide rich information on image structures, and that the time steps with a larger impact tend to be biased, we develop surprisingly simple yet effective approaches for reducing the model size and latency. We validate the effectiveness of our method on three representative I2I tasks: InstructPix2Pix for image editing, StableSR for image restoration, and ControlNet for image-conditional image generation. Our approach achieves satisfactory output quality with 39.2%, 56.4% and 39.2% reduction in model footprint, as well as 81.4%, 68.7% and 31.1% decrease in latency to InstructPix2Pix, StableSR and ControlNet, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Ntire 2017 challenge on single image super-resolution: Dataset and study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017.
  2. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
  3. Universal guidance for diffusion models. In CVPR, pages 843–852, 2023.
  4. Multidiffusion: Fusing diffusion paths for controlled image generation. International Conference on Machine Learning, 2023.
  5. Instructpix2pix: Learning to follow image editing instructions. In CVPR, pages 18392–18402, 2023.
  6. Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 182–192, 2021a.
  7. Simple baselines for image restoration. In European Conference on Computer Vision, pages 17–33. Springer, 2022.
  8. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. ICCV, 2023.
  9. Chasing sparsity in vision transformers: An end-to-end exploration. Advances in Neural Information Processing Systems, 34:19974–19988, 2021b.
  10. Perception prioritized training of diffusion models. In CVPR, pages 11472–11481, 2022.
  11. On analyzing generative and denoising capabilities of diffusion-based deep generative models. NeurIPS, 35:26218–26229, 2022.
  12. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  13. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
  14. Cross-domain compositing with pretrained diffusion models. arXiv preprint arXiv:2302.10167, 2023.
  15. Modulating pretrained diffusion models for multimodal image synthesis. SIGGRAPH Conference Proceedings, 2023.
  16. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1389–1397, 2017.
  17. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4340–4349, 2019.
  18. Delta denoising score. In ICCV, pages 2328–2337, 2023.
  19. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  20. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  21. Denoising diffusion probabilistic models. pages 6840–6851, 2020.
  22. Efficient vision transformers via fine-grained manifold distillation. arXiv e-prints, pages arXiv–2107, 2021.
  23. A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
  24. Analyzing and improving the image quality of stylegan. In CVPR, pages 8110–8119, 2020.
  25. Bk-sdm: Architecturally compressed stable diffusion for efficient text-to-image generation. ICML Workshop on Efficient Systems for Foundation Models (ES-FoMo), 2023.
  26. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
  27. Autodiffusion: Training-free optimization of time steps and architectures for automated diffusion model acceleration. In ICCV, pages 7105–7114, 2023.
  28. Magic3d: High-resolution text-to-3d content creation. In CVPR, pages 300–309, 2023a.
  29. Diffbir: Towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070, 2023b.
  30. Pseudo numerical methods for diffusion models on manifolds. 2022.
  31. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision, pages 2736–2744, 2017.
  32. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. NeurIPS, 35:5775–5787, 2022.
  33. Tf-icon: Diffusion-based training-free cross-domain image composition. In ICCV, pages 2294–2305, 2023.
  34. On distillation of guided diffusion models. In CVPR, pages 14297–14306, 2023.
  35. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453, 2023.
  36. Interpretability-aware redundancy reduction for vision transformers, 2023. US Patent App. 17/559,053.
  37. Dreamfusion: Text-to-3d using 2d diffusion. ICLR, 2023.
  38. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  39. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  40. Preditor: Text guided image editing with diffusion prior. arXiv preprint arXiv:2302.07979, 2023.
  41. Conceptlab: Creative generation using diffusion prior constraints. arXiv preprint arXiv:2308.02669, 2023.
  42. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  43. U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015.
  44. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510, 2023.
  45. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 35:36479–36494, 2022.
  46. Progressive distillation for fast sampling of diffusion models. In ICLR, 2022.
  47. Collage diffusion. arXiv preprint arXiv:2303.00262, 2023.
  48. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  49. Denoising diffusion implicit models. 2021a.
  50. Score-based generative modeling through stochastic differential equations. ICLR, 2021b.
  51. Patch slimming for efficient vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12165–12174, 2022.
  52. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
  53. Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, pages 1921–1930, 2023.
  54. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, pages 12619–12629, 2023a.
  55. Exploiting diffusion prior for real-world image super-resolution. arXiv preprint arXiv:2305.07015, 2023b.
  56. Mdp: A generalized framework for text-guided image editing by manipulating the diffusion path. arXiv preprint arXiv:2303.16765, 2023c.
  57. Towards real-world blind face restoration with generative facial prior. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9168–9178, 2021.
  58. Reconstruct-and-generate diffusion model for detail-preserving image denoising, 2023d.
  59. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17683–17693, 2022.
  60. Learning fast samplers for diffusion models by differentiating through sample quality. In ICLR, 2021.
  61. Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion. In ICCV, pages 7452–7461, 2023.
  62. Law-diffusion: Complex scene generation by diffusion with layouts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22669–22679, 2023.
  63. Freedom: Training-free energy-guided conditional diffusion model. ICCV, 2023.
  64. Adding conditional control to text-to-image diffusion models. In CVPR, pages 3836–3847, 2023a.
  65. Diffcollage: Parallel generation of large content with diffusion models. CVPR, 2023b.
  66. L. Zhu. Thop. https://github.com/Lyken17/pytorc-OpCounter, 2019.
  67. Vision transformer pruning. arXiv preprint arXiv:2104.08500, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.