Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights (2404.11936v1)

Published 18 Apr 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster convergence during training, as the model has less information to re-learn, thereby addressing the high computational cost of training. Consequently, our approach achieves a compressed model that offers improved inference speed and reduced parameter count, while maintaining minimal performance degradation. We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG). Notably, we reduce the inference time of Stable Diffusion (SD) by 34.9% while simultaneously improving its FID by 5.2% on MS-COCO T2I benchmark. This work paves the way for more efficient pruning methods for LDMs, enhancing their applicability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Token merging for fast stable diffusion. arXiv, 2023.
  2. Automatic neural network pruning that efficiently preserves the model accuracy. In 2nd International Workshop on Practical Deep Learning in the Wild, 2023.
  3. CompVis. Stable diffusion training. https://huggingface.co/CompVis/stable-diffusion-v1-4, 2023.
  4. Robert Dargavel Smith. Audiodiffusion. https://huggingface.co/teticio/latent-audio-diffusion-256, 2022.
  5. Cogview: Mastering text-to-image generation via transformers. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  6. Cogview2: Faster and better text-to-image generation via hierarchical transformers. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  7. Depgraph: Towards any structural pruning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  8. The framework tax: Disparities between inference efficiency in research and deployment, 2023.
  9. Make-a-scene: Scene-based text-to-image generation with human priors. In European Conference on Computer Vision (ECCV), 2022.
  10. Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS), 2014.
  11. Svdiff: Compact parameter space for diffusion fine-tuning. arXiv preprint arXiv:2303.11305, 2023.
  12. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  13. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  14. Progressive growing of GANs for improved quality, stability, and variation, 2018.
  15. Fréchet audio distance: A metric for evaluating music enhancement algorithms. arXiv preprint arXiv:1812.08466, 2019.
  16. Bk-sdm: A lightweight, fast, and cheap version of stable diffusion. arXiv, 2023.
  17. Auto-Encoding Variational Bayes. In International Conference on Learning Representations (ICLR), 2014.
  18. Hw-nas-bench: Hardware-aware neural architecture search benchmark. In International Conference on Learning Representations (ICLR), 2021.
  19. Pruning filters for efficient convnets. In International Conference on Learning Representations (ICLR), 2017.
  20. Q-diffusion: Quantizing diffusion models. arXiv preprint arXiv:2302.04304, 2023a.
  21. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
  22. Microsoft COCO: common objects in context. In European Conference on Computer Vision (ECCV), 2014.
  23. Audioldm: Text-to-audio generation with latent diffusion models. arXiv preprint arXiv:2301.12503, 2023.
  24. Learning efficient convolutional networks through network slimming. In IEEE International Conference on Computer Vision (ICCV), 2017.
  25. OFA-Sys. Small stable diffusion. https://huggingface.co/OFA-Sys/small-stable-diffusion-v0, 2022.
  26. Würstchen: An efficient architecture for large-scale text-to-image diffusion models. In International Conference on Learning Representations (ICLR), 2024.
  27. Qualcomm. Aimet. https://github.com/quic/aimet, 2020.
  28. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning (ICML), 2022.
  29. Zero-shot text-to-image generation. In International Conference on Machine Learning (ICML), 2021.
  30. Hierarchical text-conditional image generation with clip latents, 2022.
  31. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  32. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  33. Improved techniques for training gans. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
  34. Laion-aesthetics, 2023.
  35. Galip: Generative adversarial clips for text-to-image synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  36. Lion: Latent point diffusion models for 3d shape generation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  37. Lafite: Towards language-free training for text-to-image generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
Citations (6)

Summary

We haven't generated a summary for this paper yet.