Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation (2403.07500v1)
Abstract: The objective of personalization and stylization in text-to-image is to instruct a pre-trained diffusion model to analyze new concepts introduced by users and incorporate them into expected styles. Recently, parameter-efficient fine-tuning (PEFT) approaches have been widely adopted to address this task and have greatly propelled the development of this field. Despite their popularity, existing efficient fine-tuning methods still struggle to achieve effective personalization and stylization in T2I generation. To address this issue, we propose block-wise Low-Rank Adaptation (LoRA) to perform fine-grained fine-tuning for different blocks of SD, which can generate images faithful to input prompts and target identity and also with desired style. Extensive experiments demonstrate the effectiveness of the proposed method.
- Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 7319–7328.
- ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324.
- One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning. arXiv preprint arXiv:2306.07967.
- Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, 35: 16664–16678.
- Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650.
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. In The Eleventh International Conference on Learning Representations.
- Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 1754–1765.
- Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models. arXiv preprint arXiv:2305.18292.
- Svdiff: Compact parameter space for diffusion fine-tuning. arXiv preprint arXiv:2303.11305.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 6840–6851.
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, 2790–2799. PMLR.
- LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.
- Fact: Factor-tuning for lightweight adaptation on vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1060–1068.
- Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems, 34: 1022–1035.
- Kim, K. 2020. DeepDanbooru:AI based multi-label girl image classification system, implemented by using TensorFlow. https://github.com/KichangKim/DeepDanbooru/tree/master.
- Large-scale text-to-image generation models for visual artists’ creative works. In Proceedings of the 28th International Conference on Intelligent User Interfaces, 919–933.
- Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1931–1941.
- Köklü, M. 2021. Manga Faces Dataset:Manga Faces Dataset classified with facial expressions classes. https://www.kaggle.com/datasets/davidgamalielarcos/manga-faces-dataset/.
- Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems, 35: 109–123.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35: 1950–1965.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, 8162–8171. PMLR.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684–10695.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22500–22510.
- Hyperdreambooth: Hypernetworks for fast personalization of text-to-image models. arXiv preprint arXiv:2307.06949.
- Ryu, S. 2022. Low-rank adaptation for fast text-to-image diffusion fine-tuning. https://github.com/cloneofsimo/lora.
- StyleDrop: Text-to-Image Generation in Any Style. arXiv preprint arXiv:2306.00983.
- Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems, 35: 12991–13005.
- Training neural networks with fixed sparse masks. Advances in Neural Information Processing Systems, 34: 24193–24205.
- DyLoRA: Parameter-Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3266–3279.
- P+limit-from𝑃P+italic_P +: Extended Textual Conditioning in Text-to-Image Generation. arXiv preprint arXiv:2303.09522.
- Pufferfish: Communication-efficient models at no extra cost. Proceedings of Machine Learning and Systems, 3: 365–386.
- Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing. arXiv preprint arXiv:2309.15664.
- Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation. arXiv:2309.14859.
- The Impact of Generative Artificial Intelligence. Technical report, arXiv. org.
- Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning. In The Eleventh International Conference on Learning Representations.