A Framework to Implement 1+N Multi-task Fine-tuning Pattern in LLMs Using the CGC-LORA Algorithm (2402.01684v1)
Abstract: With the productive evolution of LLMs in the field of NLP, tons of effort has been made to effectively fine-tune common pre-trained LLMs to fulfill a variety of tasks in one or multiple specific domain. In practice, there are two prevailing ways, in which the adaptation can be achieved: (i) Multiple Independent Models: Pre-trained LLMs are fine-tuned a few times independently using the corresponding training samples from each task. (ii) An Integrated Model: Samples from all tasks are employed to fine-tune a pre-trianed LLM unitedly. To address the high computing cost and seesawing issue simultaneously, we propose a unified framework that implements a 1 + N mutli-task fine-tuning pattern in LLMs using a novel Customized Gate Control (CGC) Low-rank Adaptation (LoRA) algorithm. Our work aims to take an advantage of both MTL (i.e., CGC) and PEFT (i.e., LoRA) scheme. For a given cluster of tasks, we design an innovative layer that contains two types of experts as additional trainable parameters to make LoRA be compatible with MTL. To comprehensively evaluate the proposed framework, we conduct well-designed experiments on two public datasets. The experimental results demonstrate that the unified framework with CGC-LoRA modules achieves higher evaluation scores than all benchmarks on both two datasets.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
- A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109, 2023.
- A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435, 2023.
- Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology, page 100017, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Llmama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137, 2021.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022.
- Document-level machine translation with large language models. arXiv preprint arXiv:2304.02210, 2023.
- Exploring document-level literary machine translation with parallel paragraphs from world literature. arXiv preprint arXiv:2210.14250, 2022.
- Multilingual jailbreak challenges in large language models. arXiv preprint arXiv:2310.06474, 2023.
- Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning. arXiv preprint arXiv:2304.05613, 2023.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
- Recommendation as language processing (rlp): A unified pretrain, personalized prompt and predict paradigm (p5). In Proceedings of the 16th ACM Conference on Recommender Systems, pages 299–315, 2022.
- A survey on large language models for recommendation. arXiv preprint arXiv:2305.19860, 2023.
- Is chatgpt a good recommender? a preliminary study. arXiv preprint arXiv:2304.10149, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022.
- Large language models are few-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022, 2022.
- Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics, pages 5549–5581, 2023.
- Atlas: Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299, 2022.
- Huatuo: Tuning llama model with chinese medical knowledge. arXiv preprint arXiv:2304.06975, 2023.
- Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922, 2023.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2023.
- Moelora: An moe-based parameter efficient fine-tuning method for multi-task medical applications. arXiv preprint arXiv:2310.18339, 2023.
- Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1930–1939, 2018.
- Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems, pages 269–278, 2020.
- Cross-task generalization via natural language crowdsourcing instructions. ACL, 2022.
- Yu Zhang and Qiang Yang. An overview of multi-task learning. National Science Review, 5(1):30–43, 2018.
- A brief review on multi-task learning. Multimedia Tools and Applications, 77:29705–29725, 2018.
- Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
- Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3994–4003, 2016.
- Sluice networks: Learning what to share between loosely related tasks. stat, 1050:23, 2017.
- Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991.
- End-to-end multi-task learning with attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1871–1880, 2019.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2:61–68, 2022.
- Efficiently identifying task groupings for multi-task learning. arXiv preprint arXiv:2109.04617, 2021.
- Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269, 2023.
- Building a Pediatric Medical Corpus: Word Segmentation and Named Entity Annotation. Chinese Lexical Semantics, 2021.
- Promptcblue: A chinese prompt tuning benchmark for the medical domain. arXiv preprint arXiv:2310.14151, 2023.
- Jianxin Yang. Firefly(流萤): 中文对话式大语言模型. https://github.com/yangjianxin1/Firefly, 2023.
- A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:1910.11470, 2019.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Transfer learning in biomedical natural language processing: An evaluation of bert and elmo on ten benchmarking datasets. arXiv preprint arXiv:1906.05474, 2019.
- AlibabaCloud. Tianchi competition platform, 2023. https://tianchi.aliyun.com/?spm=a2c22.27124976.J_3941670930.7.71de132aU2uCjD.
- A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2023.
- Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360, 2023.
- Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 605–612, 2004.
- Fine-tuning large neural language models for biomedical natural language processing. Patterns, 4(4), 2023.
- Parameter-efficient transfer learning for nlp. in International Conference on Machine Learning, pages 2790–2799, 2019.
- Prefix-tuning: Optimizing continuous prompts for generation. in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 1:4582–4597, 2021.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
- Chao Song (77 papers)
- Zhihao Ye (5 papers)
- Qiqiang Lin (3 papers)
- Qiuying Peng (13 papers)
- Jun Wang (990 papers)