Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

Derivative-Free Optimization for Low-Rank Adaptation in Large Language Models (2403.01754v1)

Published 4 Mar 2024 in cs.CL

Abstract: Parameter-efficient tuning methods such as LoRA could achieve comparable performance to model tuning by tuning a small portion of the parameters. However, substantial computational resources are still required, as this process involves calculating gradients and performing back-propagation throughout the model. Much effort has recently been devoted to utilizing the derivative-free optimization method to eschew the computation of gradients and showcase an augmented level of robustness in few-shot settings. In this paper, we prepend the low-rank modules into each self-attention layer of the model and employ two derivative-free optimization methods to optimize these low-rank modules at each layer alternately. Extensive results on various tasks and LLMs demonstrate that our proposed method achieves substantial improvement and exhibits clear advantages in memory usage and convergence speed compared to existing gradient-based parameter-efficient tuning and derivative-free optimization methods in few-shot settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7319–7328, Online. Association for Computational Linguistics.
  2. Picor: Multi-task deep reinforcement learning with policy correction. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):6728–6736.
  3. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–9, Dublin, Ireland. Association for Computational Linguistics.
  4. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
  5. Language models are few-shot learners. In NeurIPS.
  6. Maiyue Chen and Ying Tan. 2021. Exponentially decaying explosion in fireworks algorithm. In 2021 IEEE Congress on Evolutionary Computation (CEC), pages 1406–1413.
  7. Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1375–1388, Seattle, United States. Association for Computational Linguistics.
  8. Inducer-tuning: Connecting prefix-tuning and adapter-tuning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 793–808, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  9. RobBERT: a Dutch RoBERTa-based Language Model. In Findings of EMNLP.
  10. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. ArXiv, abs/2002.06305.
  11. William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005).
  12. Making pre-trained language models better few-shot learners. In ACL.
  13. Response generation with context-aware prompt learning. CoRR, abs/2111.02643.
  14. PPT: Pre-trained prompt tuning for few-shot learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8410–8423, Dublin, Ireland. Association for Computational Linguistics.
  15. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary Computation, 11:1–18.
  16. Nikolaus Hansen and Andreas Ostermeier. 2001. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195.
  17. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations.
  18. Parameter-efficient transfer learning for nlp. In ICML.
  19. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  20. Instance-aware prompt learning for language understanding and generation. ACM Trans. Asian Low Resour. Lang. Inf. Process., 22(7):199:1–199:18.
  21. Parameter-efficient tuning for large language model without calculating its gradients. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 321–330, Singapore. Association for Computational Linguistics.
  22. The power of scale for parameter-efficient prompt tuning. In EMNLP.
  23. Measuring the intrinsic dimension of objective landscapes. In International Conference on Learning Representations.
  24. Junzhi Li and Ying Tan. 2018. Loser-out tournament-based fireworks algorithm for multimodal function optimization. IEEE Transactions on Evolutionary Computation, 22(5):679–691.
  25. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In ACL.
  26. Yifeng Li and Ying Tan. 2020. Multi-scale collaborative fireworks algorithm. In 2020 IEEE Congress on Evolutionary Computation (CEC), pages 1–8.
  27. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. CoRR, abs/2110.07602.
  28. Gpt understands, too. ArXiv, abs/2103.10385.
  29. Melanie Mitchell. 1998. An introduction to genetic algorithms. MIT Press.
  30. Blackvip: Black-box visual prompting for robust transfer learning. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24224–24235.
  31. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  32. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.
  33. True few-shot learning with language models. In Advances in Neural Information Processing Systems.
  34. To tune or not to tune? adapting pretrained representations to diverse tasks. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 7–14, Florence, Italy. Association for Computational Linguistics.
  35. Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pages 1946–1952. IJCAI/AAAI Press.
  36. Language models are unsupervised multitask learners. OpenAI blog, 1(8).
  37. Luis Miguel Rios and Nikolaos V. Sahinidis. 2013. Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Glob. Optim., 56(3):1247–1293.
  38. Evolution strategies as a scalable alternative to reinforcement learning. CoRR, abs/1703.03864.
  39. Timo Schick and Hinrich Schütze. 2021a. Exploiting cloze-questions for few-shot text classification and natural language inference. In EACL.
  40. Timo Schick and Hinrich Schütze. 2021b. It’s not just size that matters: Small language models are also few-shot learners. In NAACL-HLT.
  41. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc.
  42. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  43. BBTv2: Towards a gradient-free future with large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3916–3930, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  44. Black-box tuning for language-model-as-a-service. In Proceedings of ICML.
  45. LST: Ladder side-tuning for parameter and memory efficient transfer learning. In Advances in Neural Information Processing Systems.
  46. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  47. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
  48. Natural evolution strategies. J. Mach. Learn. Res., 15(1):949–980.
  49. Transformers: State-of-the-art natural language processing. In EMNLP.
  50. IDPG: an instance-dependent prompt generation method. CoRR, abs/2204.04497.
  51. Offsite-tuning: Transfer learning without full model. CoRR, abs/2302.04870.
  52. Wizardlm: Empowering large language models to follow complex instructions. ArXiv, abs/2304.12244.
  53. Revisiting few-sample {bert} fine-tuning. In International Conference on Learning Representations.
  54. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc.
  55. Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 249–258. IEEE Computer Society.
  56. Genetic prompt search via exploiting language model probabilities. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pages 5296–5305. ijcai.org.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (3)