Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning (2403.07440v3)
Abstract: Fine-tuning techniques based on Large Pretrained LLMs (LPLMs) have been proven to significantly enhance model performance on a variety of downstream tasks and effectively control the output behaviors of LPLMs. Recent studies have proposed numerous methods for fine-tuning a small number of parameters based on open-source LPLMs, reducing the demand for computational and storage resources. Among these, reparameterization fine-tuning methods represented by LoRA (Low-Rank Adaptation) have gained popularity. We find that although these methods perform well in many aspects, there is still considerable room for improvement in terms of complex task adaptability, performance, stability, and algorithm complexity. In response to this, inspired by the idea that the functions of the brain are shaped by its geometric structure, this paper integrates this idea into LoRA technology and proposes a new matrix transformation-based reparameterization method for efficient fine-tuning, named Matrix-Transformation based Low-Rank Adaptation (MTLoRA). MTLoRA aims to dynamically alter its spatial geometric structure by applying a transformation-matrix T to perform linear transformations, such as rotation, scaling, and translation, on the task-specific parameter matrix, generating new matrix feature patterns (eigenvectors) to mimic the fundamental influence of complex geometric structure feature patterns in the brain on functions, thereby enhancing the model's performance in downstream tasks. In Natural Language Understanding (NLU) tasks, it is evaluated using the GLUE benchmark test, and the results reveal that MTLoRA achieves an overall performance increase of about 1.0% across eight tasks; in Natural Language Generation (NLG) tasks, MTLoRA improves performance by an average of 0.95% and 0.56% in the DART and WebNLG tasks, respectively.
- A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861
- The fifth pascal recognizing textual entailment challenge. TAC 7, 8
- Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901
- Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055
- Chavel, I. (1984). Eigenvalues in Riemannian geometry (Academic press)
- Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307
- The pascal recognising textual entailment challenge. In Machine learning challenges workshop (Springer), 177–190
- Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems 36
- Bert: Pre-training of deep bidirectional transformers for language understanding
- Automatically constructing a corpus of sentential paraphrases. In Third International Workshop on Paraphrasing (IWP2005)
- Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 320–335
- High-resolution intersubject averaging and a coordinate system for the cortical surface. Human brain mapping 8, 272–284
- The webnlg challenge: Generating text from rdf data. In Proceedings of the 10th International Conference on Natural Language Generation. 124–133
- The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing. 1–9
- The second pascal recognising textual entailment challenge. In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment. vol. 7, 785–794
- Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations
- Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning (PMLR), 2790–2799
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations
- Lévy, B. (2006). Laplace-beltrami eigenfunctions towards an algorithm that" understands" geometry. In IEEE International Conference on Shape Modeling and Applications 2006 (SMI’06) (IEEE), 13–13
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35, 1950–1965
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
- Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, 442–451
- Electromagnetic processes in dispersive media (Cambridge University Press)
- Metaicl: Learning to learn in context. arXiv preprint arXiv:2110.15943
- Dart: Open-domain structured data record to text generation. arXiv preprint arXiv:2007.02871
- The e2e dataset: New challenges for end-to-end generation. arXiv preprint arXiv:1706.09254
- Nowack, W. J. (1995). Neocortical dynamics and human eeg rhythms. Neurology 45, 1793–1793
- Is the brain macroscopically linear? a system identification of resting state dynamics. arXiv preprint arXiv:2012.12351
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, 27730–27744
- Geometric constraints on human brain function. Nature , 1–9
- Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), eds. M. Walker, H. Ji, and A. Stent (New Orleans, Louisiana: Association for Computational Linguistics), 2227–2237. doi:10.18653/v1/N18-1202
- Pre-trained models for natural language processing: A survey. Science China Technological Sciences 63, 1872–1897
- Language models are unsupervised multitask learners. OpenAI blog 1, 9
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 5485–5551
- Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250
- Learning multiple visual domains with residual adapters. Advances in neural information processing systems 30
- Eigenmodes of brain activity: Neural field theory predictions and comparison with experiment. NeuroImage 142, 79–98
- Laplace-beltrami eigenfunction expansion of cortical manifolds. In 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro (IEEE), 372–375
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing. 1631–1642
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288
- Attention is all you need. Advances in neural information processing systems 30
- Brainprint: A discriminative characterization of brain morphology. NeuroImage 109, 232–248
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560
- Neural network acceptability judgments. Transactions of the Association for Computational Linguistics 7, 625–641
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652
- A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426
- Principal component analysis. Chemometrics and intelligent laboratory systems 2, 37–52
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414
- Increlora: Incremental parameter allocation method for parameter-efficient fine-tuning. ArXiv abs/2308.12043
- Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512
- Delta-lora: Fine-tuning high-rank parameters with the delta of low-rank matrices. arXiv preprint arXiv:2309.02411
- Yao Liang (20 papers)
- Yuwei Wang (60 papers)
- Yi Zeng (153 papers)
- Yang Li (1140 papers)