Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MLAE: Masked LoRA Experts for Visual Parameter-Efficient Fine-Tuning (2405.18897v2)

Published 29 May 2024 in cs.CV

Abstract: In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely increasing their rank. To address these issues, a natural idea is to enhance the independence and diversity of the learning process for the low-rank matrices. Therefore, we propose Masked LoRA Experts (MLAE), an innovative approach that applies the concept of masking to visual PEFT. Our method incorporates a cellular decomposition strategy that transforms a low-rank matrix into independent rank-1 submatrices, or "experts", thus enhancing independence. Additionally, we introduce a binary mask matrix that selectively activates these experts during training to promote more diverse and anisotropic learning, based on expert-level dropout strategies. Our investigations reveal that this selective activation not only enhances performance but also fosters a more diverse acquisition of knowledge with a marked decrease in parameter similarity among MLAE, significantly boosting the quality of the model. Remarkably, MLAE achieves new state-of-the-art (SOTA) performance with an average accuracy score of 78.8% on the VTAB-1k benchmark and 90.9% on the FGVC benchmark, surpassing the previous SOTA result by an average of 0.8% on both benchmarks with approximately half parameters. Our code is available at https://github.com/jie040109/MLAE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  2. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  3. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  4. Scaling vision transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12104–12113, 2022.
  5. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  6. The" something something" video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision, pages 5842–5850, 2017.
  7. Hmdb: a large video database for human motion recognition. In 2011 International conference on computer vision, pages 2556–2563. IEEE, 2011.
  8. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR workshop on fine-grained visual categorization (FGVC), volume 2. Citeseer, 2011.
  9. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127:302–321, 2019.
  10. Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, 35:16664–16678, 2022.
  11. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
  12. Convolutional bypasses are better vision transformer adapters. arXiv preprint arXiv:2207.07039, 2022.
  13. Neural prompt search. arXiv preprint arXiv:2206.04673, 2022.
  14. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
  15. Higher layers need more lora experts. arXiv preprint arXiv:2402.08562, 2024.
  16. Sparse moe as the new dropout: Scaling dense and self-slimmable transformers. arXiv preprint arXiv:2303.01610, 2023.
  17. St-moe: Designing stable and transferable sparse expert models. arXiv preprint arXiv:2202.08906, 2022.
  18. Adaptive budget allocation for parameter-efficient fine-tuning. In The Eleventh International Conference on Learning Representations, 2023.
  19. Increlora: Incremental parameter allocation method for parameter-efficient fine-tuning. arXiv preprint arXiv:2308.12043, 2023.
  20. Sira: Sparse mixture of low rank adaptation. arXiv preprint arXiv:2311.09179, 2023.
  21. Mole: Mixture of lora experts. In The Twelfth International Conference on Learning Representations, 2023.
  22. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
  23. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  24. Polyhistor: Parameter-efficient multi-task adaptation for dense vision tasks. Advances in Neural Information Processing Systems, 35:36889–36901, 2022.
  25. Lpt: long-tailed prompt tuning for image classification. In The Eleventh International Conference on Learning Representations, 2022.
  26. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  27. Towards a unified view on visual parameter-efficient transfer learning. arXiv preprint arXiv:2210.00788, 2022.
  28. One-for-all: Generalized lora for parameter-efficient fine-tuning, 2023.
  29. Rethinking efficient tuning methods from a unified perspective. arXiv preprint arXiv:2303.00690, 2023.
  30. A unified continual learning framework with general parameter-efficient tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11483–11493, 2023.
  31. Adapterbias: Parameter-efficient token-dependent representation shift for adapters in nlp tasks. arXiv preprint arXiv:2205.00305, 2022.
  32. Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems, 35:109–123, 2022.
  33. Towards efficient visual adaption via structural re-parameterization. arXiv preprint arXiv:2302.08106, 2023.
  34. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
  35. ibot: Image bert pre-training with online tokenizer. arXiv preprint arXiv:2111.07832, 2021.
  36. Data2vec: A general framework for self-supervised learning in speech, vision and language. In International Conference on Machine Learning, pages 1298–1312. PMLR, 2022.
  37. Multimodal masked autoencoders learn transferable representations. arXiv preprint arXiv:2205.14204, 2022.
  38. Mlim: Vision-and-language model pre-training with masked language and image modeling. arXiv preprint arXiv:2109.12178, 2021.
  39. Masked vision and language modeling for multi-modal representation learning. arXiv preprint arXiv:2208.02131, 2022.
  40. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  41. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9653–9663, 2022.
  42. Hypermask: Adaptive hypernetwork-based masks for continual learning. arXiv preprint arXiv:2310.00113, 2023.
  43. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 7765–7773, 2018.
  44. Continual learning via neural pruning. arXiv preprint arXiv:1903.04476, 2019.
  45. Overcoming catastrophic forgetting with hard attention to the task. In International conference on machine learning, pages 4548–4557. PMLR, 2018.
  46. Long live the lottery: The existence of winning tickets in lifelong learning. In International Conference on Learning Representations, 2020.
  47. Forget-free continual learning with winning subnetworks. In International Conference on Machine Learning, pages 10734–10750. PMLR, 2022.
  48. A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867, 2019.
  49. The caltech-ucsd birds-200-2011 dataset. 2011.
  50. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 595–604, 2015.
  51. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  52. Fine-grained car detection for visual census estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  53. Sensitivity-aware visual parameter-efficient fine-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11825–11835, 2023.
  54. Fixing weight decay regularization in adam. 2018.
  55. Fact: Factor-tuning for lightweight adaptation on vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1060–1068, 2023.
  56. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Junjie Wang (164 papers)
  2. Guangjing Yang (1 paper)
  3. Wentao Chen (39 papers)
  4. Huahui Yi (8 papers)
  5. Xiaohu Wu (34 papers)
  6. Qicheng Lao (27 papers)
  7. Zhouchen Lin (158 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com