Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 166 tok/s Pro
GPT OSS 120B 436 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning (2312.11420v1)

Published 18 Dec 2023 in cs.CL, cs.AI, and cs.CV

Abstract: This paper introduces an efficient strategy to transform LLMs into Multi-Modal LLMs (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text understanding to embracing multiple modalities, we intriguingly note that, within each attention block, tuning LayerNorm suffices to yield strong performance. Moreover, when benchmarked against other tuning approaches like full parameter finetuning or LoRA, its benefits on efficiency are substantial. For example, when compared to LoRA on a 13B model scale, performance can be enhanced by an average of over 20% across five multi-modal tasks, and meanwhile, results in a significant reduction of trainable parameters by 41.9% and a decrease in GPU memory usage by 17.6%. On top of this LayerNorm strategy, we showcase that selectively tuning only with conversational data can improve efficiency further. Beyond these empirical outcomes, we provide a comprehensive analysis to explore the role of LayerNorm in adapting LLMs to the multi-modal domain and improving the expressive power of the model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Vqa: Visual question answering. In ICCV, 2015.
  2. Layer Normalization. arXiv:1607.06450, 2016.
  3. Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In CVPR, 2021.
  4. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016.
  5. Instructblip: Towards general-purpose vision-language models with instruction tuning. NeurIPS, 2023.
  6. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
  7. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
  9. Mme: A comprehensive evaluation benchmark for multimodal large language models. arXiv preprint arXiv:2306.13394, 2023.
  10. Towards a unified view of parameter-efficient transfer learning. ICLR, 2022.
  11. Parameter-efficient transfer learning for nlp. In ICML, 2019.
  12. Lora: Low-rank adaptation of large language models. ICLR, 2022.
  13. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
  14. Andrej Karpathy and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
  15. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. ICML, 2023a.
  16. Prefix-tuning: Optimizing continuous prompts for generation. In ACL, 2021a.
  17. Prefix-tuning: Optimizing continuous prompts for generation. ACL, 2021b.
  18. Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv:1603.04779, 2016.
  19. Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:2305.10355, 2023b.
  20. Microsoft coco: Common objects in context. In ECCV, 2014.
  21. Visual instruction tuning. NeurIPS, 2023.
  22. Sgdr: Stochastic gradient descent with warm restarts. ICLR, 2017.
  23. Decoupled weight decay regularization. ICLR, 2019.
  24. Slip: Self-supervision meets language-image pre-training. In ECCV, 2022.
  25. Training language models to follow instructions with human feedback. NeurIPS, 2022.
  26. One wide feedforward is all you need. arXiv preprint arXiv:2309.01826, 2023.
  27. Learning transferable visual models from natural language supervision. In ICML, 2021.
  28. Zero: Memory optimizations toward training trillion parameter models. In SC, 2020.
  29. Pandagpt: One model to instruction-follow them all. arXiv preprint arXiv:2305.16355, 2023.
  30. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  31. Adavae: Exploring adaptive gpt-2s in variational auto-encoders for language modeling. arXiv preprint arXiv:2205.05862, 2022.
  32. Sight beyond text: Multi-modal training enhances llms in truthfulness and ethics. arXiv preprint arXiv:2309.07120, 2023.
  33. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
  34. Cider: Consensus-based image description evaluation. In CVPR, 2015.
  35. Instructiongpt-4: A 200-instruction paradigm for fine-tuning minigpt-4. arXiv preprint arXiv:2308.12067, 2023.
  36. Group normalization. In ECCV, 2018.
  37. Understanding and improving layer normalization. NeurIPS, 2019.
  38. mplug-owl: Modularization empowers large language models with multimodality. arXiv preprint arXiv:2304.14178, 2023.
  39. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014.
  40. Coca: Contrastive captioners are image-text foundation models. TMLR, 2022.
  41. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199, 2023.
  42. Vision learners meet web image-text pairs. arXiv preprint arXiv:2301.07088, 2023.
  43. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
  44. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.
  45. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
Citations (21)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.