Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy? (2402.15414v1)

Published 23 Feb 2024 in cs.LG and cs.CV

Abstract: Parameter-efficient fine-tuning stands as the standard for efficiently fine-tuning large language and vision models on downstream tasks. Specifically, the efficiency of low-rank adaptation has facilitated the creation and sharing of hundreds of custom LoRA modules, each trained on distinct data from various downstream tasks. In this paper, we explore the composability of LoRA modules, examining if combining these pre-trained modules enhances generalization to unseen downstream tasks. Our investigation involves evaluating two approaches: (a) uniform composition, involving averaging upstream LoRA modules with equal weights, and (b) learned composition, where we learn the weights for each upstream module and perform weighted averaging. Our experimental results on both vision and LLMs reveal that in few-shot settings, where only a limited number of samples are available for the downstream task, both uniform and learned composition methods result in better transfer accuracy; outperforming full fine-tuning and training a LoRA from scratch. Moreover, in full-shot settings, learned composition performs comparably to regular LoRA training with significantly fewer number of trainable parameters. Our research unveils the potential of uniform composition for enhancing transferability in low-shot settings, without introducing additional learnable parameters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Simple, scalable adaptation for neural machine translation. arXiv preprint arXiv:1909.08478, 2019.
  2. Think you have solved direct-answer question answering? try arc-da, the direct-answer ai2 reasoning challenge. arXiv preprint arXiv:2102.03315, 2021.
  3. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  4. Food-101–mining discriminative components with random forests. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pp.  446–461. Springer, 2014.
  5. Multi-head adapter routing for cross-task generalization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  6. Fusing finetuned models for better pretraining. arXiv preprint arXiv:2204.03044, 2022.
  7. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  8. Model breadcrumbs: Scaling multi-task model merging with sparse masks. arXiv preprint arXiv:2312.06795, 2023.
  9. Probing representation forgetting in supervised and unsupervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16712–16721, 2022.
  10. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  12. Concept sliders: Lora adaptors for precise control in diffusion models. arXiv preprint arXiv:2311.12092, 2023.
  13. Parameter-efficient transfer learning with diff pruning. arXiv preprint arXiv:2012.07463, 2020.
  14. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  16. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269, 2023.
  17. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089, 2022.
  18. Do better imagenet models transfer better? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  2661–2671, 2019.
  19. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pp.  554–561, 2013.
  20. Learning multiple layers of features from tiny images. 2009.
  21. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  22. Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pp.  5542–5550, 2017.
  23. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  24. Exploring versatile generative language model via parameter-efficient transfer learning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.  441–459, 2020.
  25. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022.
  26. Merging models with fisher-weighted averaging, 2021. arXiv preprint arXiv:2111.09832.
  27. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pp.  722–729. IEEE, 2008.
  28. Combining parameter-efficient modules for task-level generalisation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp.  687–702, 2023.
  29. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  30. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  31. Anatomy of catastrophic forgetting: Hidden representations and task semantics. arXiv preprint arXiv:2007.07400, 2020.
  32. Model ratatouille: Recycling diverse models for out-of-distribution generalization. In International Conference on Machine Learning, pp.  28656–28679. PMLR, 2023.
  33. Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30, 2017.
  34. Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972, 2021.
  35. Getting closer to ai complete question answering: A set of prerequisite real tasks. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.  8722–8731, 2020.
  36. Ziplora: Any subject in any style by effectively merging loras. arXiv preprint arXiv:2311.13600, 2023.
  37. Training neural networks with fixed sparse masks. Advances in Neural Information Processing Systems, 34:24193–24205, 2021.
  38. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  39. Crowdsourcing multiple choice science questions. arXiv preprint arXiv:1707.06209, 2017.
  40. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pp.  23965–23998. PMLR, 2022.
  41. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pp.  3485–3492. IEEE, 2010.
  42. Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.01196, 2023.
  43. Crossfit: A few-shot learning challenge for cross-task generalization in nlp. arXiv preprint arXiv:2104.08835, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Nader Asadi (10 papers)
  2. Mahdi Beitollahi (6 papers)
  3. Yasser Khalil (1 paper)
  4. Yinchuan Li (54 papers)
  5. Guojun Zhang (43 papers)
  6. Xi Chen (1036 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.