LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models (2404.11098v4)
Abstract: In the era of AIGC, the demand for low-budget or even on-device applications of diffusion models emerged. In terms of compressing the Stable Diffusion models (SDMs), several approaches have been proposed, and most of them leveraged the handcrafted layer removal methods to obtain smaller U-Nets, along with knowledge distillation to recover the network performance. However, such a handcrafting manner of layer removal is inefficient and lacks scalability and generalization, and the feature distillation employed in the retraining phase faces an imbalance issue that a few numerically significant feature loss terms dominate over others throughout the retraining process. To this end, we proposed the layer pruning and normalized distillation for compressing diffusion models (LAPTOP-Diff). We, 1) introduced the layer pruning method to compress SDM's U-Net automatically and proposed an effective one-shot pruning criterion whose one-shot performance is guaranteed by its good additivity property, surpassing other layer pruning and handcrafted layer removal methods, 2) proposed the normalized feature distillation for retraining, alleviated the imbalance issue. Using the proposed LAPTOP-Diff, we compressed the U-Nets of SDXL and SDM-v1.5 for the most advanced performance, achieving a minimal 4.0% decline in PickScore at a pruning ratio of 50% while the comparative methods' minimal PickScore decline is 8.2%.
- Daria Bakshandaeva Christoph Schuhmann Ksenia Ivanova Alex Shonenkov, Misha Konstantinov and Nadiia Klokova. 2023. If by deepfloyd lab at stabilityai. https://huggingface.co/spaces/DeepFloyd/IF
- A study on the evaluation of generative models. arXiv preprint arXiv:2206.10935 (2022).
- Ollin Boer Bohan. 2023. Sdxl-vae-fp16-fix. https://huggingface.co/madebyollin/sdxl-vae-fp16-fix
- Shi Chen and Qi Zhao. 2018. Shallowing deep networks: Layer-wise pruning based on feature representations. TPAMI (2018).
- Speed is all you need: On-device acceleration of large diffusion models via gpu-aware optimizations. In CVPR.
- Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. In NeurIPS.
- To filter prune, or to layer prune, that is the question. In ACCV.
- Structural pruning for diffusion models. In NeurIPS.
- Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss. arXiv preprint arXiv:2401.02677 (2024).
- CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In EMNLP.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS.
- Distilling the knowledge in a neural network. In NeurIPS Workshop.
- Denoising diffusion probabilistic models. In NeurIPS.
- Jonathan Ho and Tim Salimans. 2021. Classifier-Free Diffusion Guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
- Shortened LLaMA: A Simple Depth Pruning for Large Language Models. arXiv preprint arXiv:2402.02834 (2024).
- On architectural compression of text-to-image diffusion models. arXiv preprint arXiv:2305.15798 (2023).
- Pick-a-pic: An open dataset of user preferences for text-to-image generation. In NeurIPS.
- pickapic_v1. https://huggingface.co/datasets/yuvalkirstain/pickapic_v1
- KOALA: Self-attention matters in knowledge distillation of latent diffusion models for memory-efficient and fast image synthesis. arXiv preprint arXiv:2312.04005 (2023).
- Faster diffusion: Rethinking the role of unet encoder in diffusion models. arXiv preprint arXiv:2312.09608 (2023).
- Q-diffusion: Quantizing diffusion models. In ICCV.
- Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. In ICLR.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095 (2022).
- Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378 (2023).
- Deepcache: Accelerating diffusion models for free. arXiv preprint arXiv:2312.00858 (2023).
- Stable-diffusion-xl-base-1.0. https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
- SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. In ICLR.
- Learning transferable visual models from natural language supervision. In ICML.
- Zero-shot text-to-image generation. In ICML.
- High-resolution image synthesis with latent diffusion models. In CVPR.
- Stable-diffusion-v1-5. https://huggingface.co/runwayml/stable-diffusion-v1-5
- Fitnets: Hints for thin deep nets. In ICLR.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS.
- Tim Salimans and Jonathan Ho. 2022. Progressive Distillation for Fast Sampling of Diffusion Models. In ICLR.
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
- Adversarial diffusion distillation. arXiv preprint arXiv:2311.17042 (2023).
- Laion-aesthetics v2 6+. https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus
- Laion2B-en. https://huggingface.co/datasets/laion/laion2B-en
- Segmind. 2023a. Small-sd. https://huggingface.co/segmind/small-sd
- Segmind. 2023b. Tiny-sd. https://huggingface.co/segmind/tiny-sd
- SG_161222. 2023. Realistic Vision. https://civitai.com/models/4201?modelVersionId=114367
- Post-training quantization on diffusion models. In CVPR.
- socalguitarist. 2023. ProtoVision XL. https://civitai.com/models/125703?modelVersionId=172397
- Denoising Diffusion Implicit Models. In ICLR.
- Score-Based Generative Modeling through Stochastic Differential Equations. In ICLR.
- RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization. In ECCV.
- Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis. arXiv preprint arXiv:2306.09341 (2023).
- Better aligning text-to-image models with human preference. In ICCV.
- Imagereward: Learning and evaluating human preferences for text-to-image generation. In NeurIPS.
- ImageRewardDB. https://huggingface.co/datasets/THUDM/ImageRewardDB
- Efficient joint optimization of layer-adaptive weight pruning in deep neural networks. In ICCV.
- Ufogen: You forward once large scale text-to-image generation via diffusion gans. arXiv preprint arXiv:2311.09257 (2023).
- Zavy. 2023. ZavyChromaXL. https://civitai.com/models/119229/zavychromaxl
- MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices. arXiv preprint arXiv:2311.16567 (2023).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.