Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning (2403.07440v3)

Published 12 Mar 2024 in cs.CL and cs.AI

Abstract: Fine-tuning techniques based on Large Pretrained LLMs (LPLMs) have been proven to significantly enhance model performance on a variety of downstream tasks and effectively control the output behaviors of LPLMs. Recent studies have proposed numerous methods for fine-tuning a small number of parameters based on open-source LPLMs, reducing the demand for computational and storage resources. Among these, reparameterization fine-tuning methods represented by LoRA (Low-Rank Adaptation) have gained popularity. We find that although these methods perform well in many aspects, there is still considerable room for improvement in terms of complex task adaptability, performance, stability, and algorithm complexity. In response to this, inspired by the idea that the functions of the brain are shaped by its geometric structure, this paper integrates this idea into LoRA technology and proposes a new matrix transformation-based reparameterization method for efficient fine-tuning, named Matrix-Transformation based Low-Rank Adaptation (MTLoRA). MTLoRA aims to dynamically alter its spatial geometric structure by applying a transformation-matrix T to perform linear transformations, such as rotation, scaling, and translation, on the task-specific parameter matrix, generating new matrix feature patterns (eigenvectors) to mimic the fundamental influence of complex geometric structure feature patterns in the brain on functions, thereby enhancing the model's performance in downstream tasks. In Natural Language Understanding (NLU) tasks, it is evaluated using the GLUE benchmark test, and the results reveal that MTLoRA achieves an overall performance increase of about 1.0% across eight tasks; in Natural Language Generation (NLG) tasks, MTLoRA improves performance by an average of 0.95% and 0.56% in the DART and WebNLG tasks, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861
  2. The fifth pascal recognizing textual entailment challenge. TAC 7, 8
  3. Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901
  4. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055
  5. Chavel, I. (1984). Eigenvalues in Riemannian geometry (Academic press)
  6. Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307
  7. The pascal recognising textual entailment challenge. In Machine learning challenges workshop (Springer), 177–190
  8. Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems 36
  9. Bert: Pre-training of deep bidirectional transformers for language understanding
  10. Automatically constructing a corpus of sentential paraphrases. In Third International Workshop on Paraphrasing (IWP2005)
  11. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 320–335
  12. High-resolution intersubject averaging and a coordinate system for the cortical surface. Human brain mapping 8, 272–284
  13. The webnlg challenge: Generating text from rdf data. In Proceedings of the 10th International Conference on Natural Language Generation. 124–133
  14. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing. 1–9
  15. The second pascal recognising textual entailment challenge. In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment. vol. 7, 785–794
  16. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations
  17. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654
  18. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning (PMLR), 2790–2799
  19. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations
  20. Lévy, B. (2006). Laplace-beltrami eigenfunctions towards an algorithm that" understands" geometry. In IEEE International Conference on Shape Modeling and Applications 2006 (SMI’06) (IEEE), 13–13
  21. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35, 1950–1965
  22. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
  23. Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, 442–451
  24. Electromagnetic processes in dispersive media (Cambridge University Press)
  25. Metaicl: Learning to learn in context. arXiv preprint arXiv:2110.15943
  26. Dart: Open-domain structured data record to text generation. arXiv preprint arXiv:2007.02871
  27. The e2e dataset: New challenges for end-to-end generation. arXiv preprint arXiv:1706.09254
  28. Nowack, W. J. (1995). Neocortical dynamics and human eeg rhythms. Neurology 45, 1793–1793
  29. Is the brain macroscopically linear? a system identification of resting state dynamics. arXiv preprint arXiv:2012.12351
  30. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, 27730–27744
  31. Geometric constraints on human brain function. Nature , 1–9
  32. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), eds. M. Walker, H. Ji, and A. Stent (New Orleans, Louisiana: Association for Computational Linguistics), 2227–2237. doi:10.18653/v1/N18-1202
  33. Pre-trained models for natural language processing: A survey. Science China Technological Sciences 63, 1872–1897
  34. Language models are unsupervised multitask learners. OpenAI blog 1, 9
  35. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 5485–5551
  36. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250
  37. Learning multiple visual domains with residual adapters. Advances in neural information processing systems 30
  38. Eigenmodes of brain activity: Neural field theory predictions and comparison with experiment. NeuroImage 142, 79–98
  39. Laplace-beltrami eigenfunction expansion of cortical manifolds. In 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro (IEEE), 372–375
  40. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing. 1631–1642
  41. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971
  42. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288
  43. Attention is all you need. Advances in neural information processing systems 30
  44. Brainprint: A discriminative characterization of brain morphology. NeuroImage 109, 232–248
  45. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461
  46. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560
  47. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics 7, 625–641
  48. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652
  49. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426
  50. Principal component analysis. Chemometrics and intelligent laboratory systems 2, 37–52
  51. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199
  52. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414
  53. Increlora: Incremental parameter allocation method for parameter-efficient fine-tuning. ArXiv abs/2308.12043
  54. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512
  55. Delta-lora: Fine-tuning high-rank parameters with the delta of low-rank matrices. arXiv preprint arXiv:2309.02411
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yao Liang (20 papers)
  2. Yuwei Wang (60 papers)
  3. Yi Zeng (153 papers)
  4. Yang Li (1140 papers)