Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models (2410.21759v3)

Published 29 Oct 2024 in cs.CV

Abstract: Fine-tuning pre-trained diffusion models under limited budgets has gained great success. In particular, the recent advances that directly fine-tune the quantized weights using Low-rank Adaptation (LoRA) further reduces training costs. Despite these progress, we point out that existing adaptation recipes are not inference-efficient. Specifically, additional post-training quantization (PTQ) on tuned weights is needed during deployment, which results in noticeable performance drop when the bit-width is low. Based on this observation, we introduce IntLoRA, which adapts quantized diffusion models with integer-type low-rank parameters, to include inference efficiency during tuning. Specifically, IntLoRA enables pre-trained weights to remain quantized during training, facilitating fine-tuning on consumer-level GPUs. During inference, IntLoRA weights can be seamlessly merged into pre-trained weights to directly obtain quantized downstream weights without PTQ. Extensive experiments show our IntLoRA achieves significant speedup on both training and inference without losing performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Emerging properties in self-supervised vision transformers. In ICCV, pp.  9650–9660, 2021.
  2. One-for-All: Generalized LoRA for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967, 2023.
  3. AdaptFormer: Adapting vision transformers for scalable visual recognition. In NeurIPS, volume 35, pp.  16664–16678, 2022.
  4. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
  5. QLoRA: Efficient finetuning of quantized llms. In NeurIPS, volume 36, 2024.
  6. The Llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
  7. Learned step size quantization. arXiv preprint arXiv:1902.08153, 2019.
  8. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366, 2021.
  9. EfficientDM: Efficient quantization-aware fine-tuning of low-bit diffusion models. arXiv preprint arXiv:2310.03270, 2023.
  10. PTQD: Accurate post-training quantization for diffusion models. In NeurIPS, volume 36, 2024.
  11. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, volume 30, 2017.
  12. Parameter-efficient transfer learning for NLP. In ICML, pp.  2790–2799. PMLR, 2019.
  13. LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  14. SliM-LLM: Salience-driven mixed-precision quantization for large language models. arXiv preprint arXiv:2405.14917, 2024a.
  15. TFMQ-DM: Temporal feature maintenance quantization for diffusion models. In CVPR, pp.  7362–7371, 2024b.
  16. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In CVPR, pp.  2704–2713, 2018.
  17. Visual prompt tuning. In ECCV, pp.  709–727. Springer, 2022.
  18. FacT: Factor-tuning for lightweight adaptation on vision transformer. In AAAI, pp.  1060–1068, 2023.
  19. Tero Karras. Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
  20. Prefix-Tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  21. Q-Diffusion: Quantizing diffusion models. In ICCV, pp.  17535–17545, 2023.
  22. Q-ViT: Accurate and fully quantized low-bit vision transformer. In NeurIPS, volume 35, pp.  34451–34463, 2022.
  23. Q-DM: An efficient low-bit quantized diffusion model. In NeurIPS, volume 36, 2024.
  24. BRECQ: Pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426, 2021.
  25. Scaling & shifting your features: A new baseline for efficient model tuning. In NeurIPS, volume 35, pp.  109–123, 2022.
  26. Microsoft COCO: Common objects in context. In ECCV, pp.  740–755. Springer, 2014.
  27. PD-Quant: Post-training quantization based on prediction difference metric. In CVPR, pp.  24427–24437, 2023a.
  28. Parameter-efficient orthogonal finetuning via butterfly factorization. arXiv preprint arXiv:2311.06243, 2023b.
  29. Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 20(3):209–212, 2012.
  30. Up or down? adaptive rounding for post-training quantization. In ICML, pp.  7197–7206. PMLR, 2020.
  31. A white paper on neural network quantization. arXiv preprint arXiv:2106.08295, 2021.
  32. Loss aware post-training quantization. Machine Learning, 110(11):3245–3262, 2021.
  33. SDXL: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
  34. Accurate LoRA-finetuning quantization of LLMs via information retention. arXiv preprint arXiv:2402.05445, 2024.
  35. Controlling text-to-image diffusion by orthogonal finetuning. In NeurIPS, volume 36, pp.  79320–79362, 2023.
  36. Learning transferable visual models from natural language supervision. In ICML, pp.  8748–8763. PMLR, 2021.
  37. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  38. High-resolution image synthesis with latent diffusion models. In CVPR, pp.  10684–10695, 2022.
  39. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pp.  22500–22510, 2023.
  40. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, volume 35, pp.  36479–36494, 2022.
  41. Post-training quantization on diffusion models. In CVPR, pp.  1972–1981, 2023.
  42. Styledrop: Text-to-image generation in any style. arXiv preprint arXiv:2306.00983, 2023.
  43. Attention is all you need. CoRR, abs/1706.03762, 2017. URL http://arxiv.org/abs/1706.03762.
  44. Towards accurate data-free quantization for diffusion models. arXiv preprint arXiv:2305.18723, 2(5), 2023a.
  45. Exploring CLIP for assessing the look and feel of images. In AAAI, 2023b.
  46. QDrop: Randomly dropping quantization for extremely low-bit post-training quantization. arXiv preprint arXiv:2203.05740, 2022.
  47. Q-DETR: An efficient low-bit quantized detection transformer. In CVPR, pp.  3842–3851, 2023a.
  48. QA-LoRA: Quantization-aware low-rank adaptation of large language models. arXiv preprint arXiv:2309.14717, 2023b.
  49. MetaMath: bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023.
  50. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
  51. Adding conditional control to text-to-image diffusion models. In ICCV, pp.  3836–3847, 2023.
  52. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pp.  586–595, 2018.
  53. Scene parsing through ADE20K dataset. In CVPR, pp.  633–641, 2017.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com