Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Batched Low-Rank Adaptation of Foundation Models (2312.05677v3)

Published 9 Dec 2023 in cs.LG, cs.AI, and cs.CL

Abstract: Low-Rank Adaptation (LoRA) has recently gained attention for fine-tuning foundation models by incorporating trainable low-rank matrices, thereby reducing the number of trainable parameters. While LoRA offers numerous advantages, its applicability for real-time serving to a diverse and global user base is constrained by its incapability to handle multiple task-specific adapters efficiently. This imposes a performance bottleneck in scenarios requiring personalized, task-specific adaptations for each incoming request. To mitigate this constraint, we introduce Fast LoRA (FLoRA), a framework in which each input example in a minibatch can be associated with its unique low-rank adaptation weights, allowing for efficient batching of heterogeneous requests. We empirically demonstrate that FLoRA retains the performance merits of LoRA, showcasing competitive results on the MultiPL-E code generation benchmark spanning over 8 languages and a multilingual speech recognition task across 6 languages.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Performance, design, and autotuning of batched gemm for gpus. In Information Security Conference, 2016. URL https://api.semanticscholar.org/CorpusID:2559252.
  2. Common voice: A massively-multilingual speech corpus. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp.  4211–4215, 2020.
  3. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  4. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. ArXiv, abs/2106.10199, 2021. URL https://api.semanticscholar.org/CorpusID:231672601.
  5. Multipl-e: A scalable and polyglot approach to benchmarking neural code generation. IEEE Transactions on Software Engineering, 49:3675–3691, 2023. URL https://api.semanticscholar.org/CorpusID:258205341.
  6. One-for-all: Generalized lora for parameter-efficient fine-tuning. ArXiv, abs/2306.07967, 2023. URL https://api.semanticscholar.org/CorpusID:259144860.
  7. Evaluating large language models trained on code. ArXiv, abs/2107.03374, 2021.
  8. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  9. Flashattention: Fast and memory-efficient exact attention with io-awareness. ArXiv, abs/2205.14135, 2022. URL https://api.semanticscholar.org/CorpusID:249151871.
  10. Joe Davison. Compacter: Efficient low-rank hypercomplex adapter layers. In Neural Information Processing Systems, 2021. URL https://api.semanticscholar.org/CorpusID:235356070.
  11. Qlora: Efficient finetuning of quantized llms. ArXiv, abs/2305.14314, 2023. URL https://api.semanticscholar.org/CorpusID:258841328.
  12. Black-box prompt learning for pre-trained language models. ArXiv, abs/2201.08531, 2022. URL https://api.semanticscholar.org/CorpusID:246210164.
  13. Mixture-of-domain-adapters: Decoupling and injecting domain knowledge to pre-trained language models’ memories. In Annual Meeting of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar.org/CorpusID:259108831.
  14. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. ArXiv, abs/2203.06904, 2022. URL https://api.semanticscholar.org/CorpusID:247446969.
  15. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res., 23:120:1–120:39, 2021. URL https://api.semanticscholar.org/CorpusID:231573431.
  16. Making pre-trained language models better few-shot learners. ArXiv, abs/2012.15723, 2021. URL https://api.semanticscholar.org/CorpusID:229923710.
  17. Parameter-efficient transfer learning with diff pruning. In Annual Meeting of the Association for Computational Linguistics, 2020. URL https://api.semanticscholar.org/CorpusID:229152766.
  18. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=0RDcd5Axok.
  19. Hyperprompt: Prompt-based task-conditioning of transformers. ArXiv, abs/2203.00759, 2022b. URL https://api.semanticscholar.org/CorpusID:247218062.
  20. Unnatural instructions: Tuning language models with (almost) no human labor. ArXiv, abs/2212.09689, 2022.
  21. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, 2019. URL https://api.semanticscholar.org/CorpusID:59599816.
  22. Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685, 2021. URL https://api.semanticscholar.org/CorpusID:235458009.
  23. Lorahub: Efficient cross-task generalization via dynamic lora composition. ArXiv, abs/2307.13269, 2023. URL https://api.semanticscholar.org/CorpusID:260155012.
  24. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023.
  25. Simple and scalable predictive uncertainty estimation using deep ensembles. In NIPS, 2016. URL https://api.semanticscholar.org/CorpusID:6294674.
  26. Platypus: Quick, cheap, and powerful refinement of llms. ArXiv, abs/2308.07317, 2023. URL https://api.semanticscholar.org/CorpusID:260886870.
  27. Gshard: Scaling giant models with conditional computation and automatic sharding. ArXiv, abs/2006.16668, 2020. URL https://api.semanticscholar.org/CorpusID:220265858.
  28. The power of scale for parameter-efficient prompt tuning. In Conference on Empirical Methods in Natural Language Processing, 2021. URL https://api.semanticscholar.org/CorpusID:233296808.
  29. Starcoder: may the source be with you! ArXiv, abs/2305.06161, 2023. URL https://api.semanticscholar.org/CorpusID:258588247.
  30. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), abs/2101.00190, 2021. URL https://api.semanticscholar.org/CorpusID:230433941.
  31. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. ArXiv, abs/2205.05638, 2022a. URL https://api.semanticscholar.org/CorpusID:248693283.
  32. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. ArXiv, abs/2110.07602, 2021. URL https://api.semanticscholar.org/CorpusID:238857040.
  33. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Annual Meeting of the Association for Computational Linguistics, 2022b. URL https://api.semanticscholar.org/CorpusID:248780177.
  34. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. ArXiv, abs/2106.04489, 2021. URL https://api.semanticscholar.org/CorpusID:235309789.
  35. Coherence boosting: When your pretrained language model is not paying enough attention. In Annual Meeting of the Association for Computational Linguistics, 2021. URL https://api.semanticscholar.org/CorpusID:247476407.
  36. Fine-tuning language models with just forward passes. ArXiv, abs/2305.17333, 2023. URL https://api.semanticscholar.org/CorpusID:258959274.
  37. Unipelt: A unified framework for parameter-efficient language model tuning. In Annual Meeting of the Association for Computational Linguistics, 2021. URL https://api.semanticscholar.org/CorpusID:238857301.
  38. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023. URL https://api.semanticscholar.org/CorpusID:257532815.
  39. Measuring the impact of programming language distribution. In International Conference on Machine Learning, 2023. URL https://api.semanticscholar.org/CorpusID:256615914.
  40. Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155, 2022.
  41. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp.  8024–8035. Curran Associates, Inc., 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
  42. From sparse to soft mixtures of experts. ArXiv, abs/2308.00951, 2023. URL https://api.semanticscholar.org/CorpusID:260378993.
  43. Robust speech recognition via large-scale weak supervision, 2022. URL https://arxiv.org/abs/2212.04356.
  44. Progressive neural networks. ArXiv, abs/1606.04671, 2016. URL https://api.semanticscholar.org/CorpusID:15350923.
  45. Exploiting cloze-questions for few-shot text classification and natural language inference. In Conference of the European Chapter of the Association for Computational Linguistics, 2020a. URL https://api.semanticscholar.org/CorpusID:210838924.
  46. It’s not just size that matters: Small language models are also few-shot learners. ArXiv, abs/2009.07118, 2020b. URL https://api.semanticscholar.org/CorpusID:221703107.
  47. Training neural networks with fixed sparse masks. ArXiv, abs/2111.09839, 2021. URL https://api.semanticscholar.org/CorpusID:244345839.
  48. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  49. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288, 2023. URL https://api.semanticscholar.org/CorpusID:259950998.
  50. Spot: Better frozen model adaptation through soft prompt transfer. ArXiv, abs/2110.07904, 2021. URL https://api.semanticscholar.org/CorpusID:239009558.
  51. Transprompt: Towards an automatic transferable prompting framework for few-shot text classification. In Conference on Empirical Methods in Natural Language Processing, 2021. URL https://api.semanticscholar.org/CorpusID:243865402.
  52. Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models. ArXiv, abs/2205.12410, 2022a. URL https://api.semanticscholar.org/CorpusID:249063002.
  53. Parameter-efficient tuning of large language models. 2022b. URL https://api.semanticscholar.org/CorpusID:249536106.
  54. Self-instruct: Aligning language model with self generated instructions. ArXiv, abs/2212.10560, 2022c.
  55. Orca: A distributed serving system for transformer-based generative models. In USENIX Symposium on Operating Systems Design and Implementation, 2022. URL https://api.semanticscholar.org/CorpusID:251734964.
  56. Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning. ArXiv, abs/2308.03303, 2023a. URL https://api.semanticscholar.org/CorpusID:260683267.
  57. Adaptive budget allocation for parameter-efficient fine-tuning. ArXiv, abs/2303.10512, 2023b. URL https://api.semanticscholar.org/CorpusID:257631760.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yeming Wen (14 papers)
  2. Swarat Chaudhuri (61 papers)
Citations (16)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets