IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models (2410.21759v3)
Abstract: Fine-tuning pre-trained diffusion models under limited budgets has gained great success. In particular, the recent advances that directly fine-tune the quantized weights using Low-rank Adaptation (LoRA) further reduces training costs. Despite these progress, we point out that existing adaptation recipes are not inference-efficient. Specifically, additional post-training quantization (PTQ) on tuned weights is needed during deployment, which results in noticeable performance drop when the bit-width is low. Based on this observation, we introduce IntLoRA, which adapts quantized diffusion models with integer-type low-rank parameters, to include inference efficiency during tuning. Specifically, IntLoRA enables pre-trained weights to remain quantized during training, facilitating fine-tuning on consumer-level GPUs. During inference, IntLoRA weights can be seamlessly merged into pre-trained weights to directly obtain quantized downstream weights without PTQ. Extensive experiments show our IntLoRA achieves significant speedup on both training and inference without losing performance.
- Emerging properties in self-supervised vision transformers. In ICCV, pp. 9650–9660, 2021.
- One-for-All: Generalized LoRA for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967, 2023.
- AdaptFormer: Adapting vision transformers for scalable visual recognition. In NeurIPS, volume 35, pp. 16664–16678, 2022.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- QLoRA: Efficient finetuning of quantized llms. In NeurIPS, volume 36, 2024.
- The Llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
- Learned step size quantization. arXiv preprint arXiv:1902.08153, 2019.
- Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366, 2021.
- EfficientDM: Efficient quantization-aware fine-tuning of low-bit diffusion models. arXiv preprint arXiv:2310.03270, 2023.
- PTQD: Accurate post-training quantization for diffusion models. In NeurIPS, volume 36, 2024.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, volume 30, 2017.
- Parameter-efficient transfer learning for NLP. In ICML, pp. 2790–2799. PMLR, 2019.
- LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- SliM-LLM: Salience-driven mixed-precision quantization for large language models. arXiv preprint arXiv:2405.14917, 2024a.
- TFMQ-DM: Temporal feature maintenance quantization for diffusion models. In CVPR, pp. 7362–7371, 2024b.
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. In CVPR, pp. 2704–2713, 2018.
- Visual prompt tuning. In ECCV, pp. 709–727. Springer, 2022.
- FacT: Factor-tuning for lightweight adaptation on vision transformer. In AAAI, pp. 1060–1068, 2023.
- Tero Karras. Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
- Prefix-Tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Q-Diffusion: Quantizing diffusion models. In ICCV, pp. 17535–17545, 2023.
- Q-ViT: Accurate and fully quantized low-bit vision transformer. In NeurIPS, volume 35, pp. 34451–34463, 2022.
- Q-DM: An efficient low-bit quantized diffusion model. In NeurIPS, volume 36, 2024.
- BRECQ: Pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426, 2021.
- Scaling & shifting your features: A new baseline for efficient model tuning. In NeurIPS, volume 35, pp. 109–123, 2022.
- Microsoft COCO: Common objects in context. In ECCV, pp. 740–755. Springer, 2014.
- PD-Quant: Post-training quantization based on prediction difference metric. In CVPR, pp. 24427–24437, 2023a.
- Parameter-efficient orthogonal finetuning via butterfly factorization. arXiv preprint arXiv:2311.06243, 2023b.
- Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 20(3):209–212, 2012.
- Up or down? adaptive rounding for post-training quantization. In ICML, pp. 7197–7206. PMLR, 2020.
- A white paper on neural network quantization. arXiv preprint arXiv:2106.08295, 2021.
- Loss aware post-training quantization. Machine Learning, 110(11):3245–3262, 2021.
- SDXL: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
- Accurate LoRA-finetuning quantization of LLMs via information retention. arXiv preprint arXiv:2402.05445, 2024.
- Controlling text-to-image diffusion by orthogonal finetuning. In NeurIPS, volume 36, pp. 79320–79362, 2023.
- Learning transferable visual models from natural language supervision. In ICML, pp. 8748–8763. PMLR, 2021.
- Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- High-resolution image synthesis with latent diffusion models. In CVPR, pp. 10684–10695, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pp. 22500–22510, 2023.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, volume 35, pp. 36479–36494, 2022.
- Post-training quantization on diffusion models. In CVPR, pp. 1972–1981, 2023.
- Styledrop: Text-to-image generation in any style. arXiv preprint arXiv:2306.00983, 2023.
- Attention is all you need. CoRR, abs/1706.03762, 2017. URL http://arxiv.org/abs/1706.03762.
- Towards accurate data-free quantization for diffusion models. arXiv preprint arXiv:2305.18723, 2(5), 2023a.
- Exploring CLIP for assessing the look and feel of images. In AAAI, 2023b.
- QDrop: Randomly dropping quantization for extremely low-bit post-training quantization. arXiv preprint arXiv:2203.05740, 2022.
- Q-DETR: An efficient low-bit quantized detection transformer. In CVPR, pp. 3842–3851, 2023a.
- QA-LoRA: Quantization-aware low-rank adaptation of large language models. arXiv preprint arXiv:2309.14717, 2023b.
- MetaMath: bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023.
- BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
- Adding conditional control to text-to-image diffusion models. In ICCV, pp. 3836–3847, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pp. 586–595, 2018.
- Scene parsing through ADE20K dataset. In CVPR, pp. 633–641, 2017.