Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
96 tokens/sec
Gemini 2.5 Pro Premium
48 tokens/sec
GPT-5 Medium
15 tokens/sec
GPT-5 High Premium
23 tokens/sec
GPT-4o
104 tokens/sec
DeepSeek R1 via Azure Premium
77 tokens/sec
GPT OSS 120B via Groq Premium
466 tokens/sec
Kimi K2 via Groq Premium
201 tokens/sec
2000 character limit reached

TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation (2405.11236v2)

Published 18 May 2024 in cs.CV

Abstract: As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process. In response to these challenges, we propose an innovative method that integrates Singular Value Decomposition (SVD) into the Low-Rank Adaptation (LoRA) parameter update strategy, aimed at enhancing the fine-tuning efficiency and output quality of image generation models. By incorporating SVD within the LoRA framework, our method not only effectively reduces the risk of overfitting but also enhances the stability of model outputs, and captures subtle, creator-desired feature adjustments more accurately. We evaluated our method on multiple datasets, and the results show that, compared to traditional fine-tuning methods, our approach significantly improves the model's generalization ability and creative flexibility while maintaining the quality of generation. Moreover, this method maintains LoRA's excellent performance under resource-constrained conditions, allowing for significant improvements in image generation quality without sacrificing the original efficiency and resource advantages.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Celebrity Face Image Dataset — kaggle.com. https://www.kaggle.com/datasets/vishesh1412/celebrity-face-image-dataset. [Accessed 2023-12-05].
  2. emilianJR/epiCRealism · Hugging Face — huggingface.co. https://huggingface.co/emilianJR/epiCRealism. [Accessed 2024-02-18].
  3. Introducing ChatGPT — openai.com. https://openai.com/blog/chatgpt. [Accessed 15-03-2024].
  4. Lykon/AnyLoRA · Hugging Face — huggingface.co. https://huggingface.co/Lykon/AnyLoRA. [Accessed 2023-11-20].
  5. runwayml/stable-diffusion-v1-5 · Hugging Face — huggingface.co. https://huggingface.co/runwayml/stable-diffusion-v1-5. [Accessed 2023-11-02].
  6. SG161222/Realistic_Vision_V5.1_noVAE · Hugging Face — huggingface.co. https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE. [Accessed 2024-01-12].
  7. James Bisgard. Analysis and linear algebra: the singular value decomposition and applications, volume 94. American Mathematical Soc., 2020.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  10. Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650, 2022.
  11. Rigging the lottery: Making all tickets winners. In International conference on machine learning, pages 2943–2952. PMLR, 2020.
  12. Optimal brain compression: A framework for accurate post-training quantization and pruning. Advances in Neural Information Processing Systems, 35:4475–4488, 2022.
  13. The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574, 2019.
  14. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  15. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  16. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 22(241):1–124, 2021.
  17. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  18. Accelerated sparse neural training: A provable and efficient method to find n: m transposable masks. Advances in neural information processing systems, 34:21099–21111, 2021.
  19. Fedpara: Low-rank hadamard product for communication-efficient federated learning. arXiv preprint arXiv:2108.06098, 2021.
  20. Exposing and exploiting fine-grained block structures for fast and accurate sparse training. Advances in Neural Information Processing Systems, 35:38345–38357, 2022.
  21. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  22. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  23. Loftq: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659, 2023.
  24. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022.
  25. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021.
  26. Gpt understands, too. AI Open, 2023.
  27. Metaicl: Learning to learn in context. arXiv preprint arXiv:2110.15943, 2021.
  28. Sparseprop: Efficient sparse backpropagation for faster training of neural networks at the edge. In International Conference on Machine Learning, pages 26215–26227. PMLR, 2023.
  29. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  30. Ac/dc: Alternating compressed/decompressed training of deep neural networks. Advances in neural information processing systems, 34:8557–8570, 2021.
  31. Justin N. M. Pinkney. Pokemon blip captions. https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions/, 2022.
  32. Controlling text-to-image diffusion by orthogonal finetuning. Advances in Neural Information Processing Systems, 36, 2024.
  33. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  34. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  35. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  36. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35:36479–36494, 2022.
  37. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207, 2021.
  38. Movement pruning: Adaptive sparsity by fine-tuning. Advances in neural information processing systems, 33:20378–20389, 2020.
  39. Woodfisher: Efficient second-order approximation for neural network compression. Advances in Neural Information Processing Systems, 33:18098–18109, 2020.
  40. Styledrop: Text-to-image generation in any style. arXiv preprint arXiv:2306.00983, 2023.
  41. Training neural networks with fixed sparse masks. Advances in Neural Information Processing Systems, 34:24193–24205, 2021.
  42. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  43. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
  44. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. arXiv preprint arXiv:2204.07705, 2022.
  45. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
  46. Adaptive budget allocation for parameter-efficient fine-tuning. In The Eleventh International Conference on Learning Representations, 2022.
  47. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com