2000 character limit reached
Are LLMs Effective Backbones for Fine-tuning? An Experimental Investigation of Supervised LLMs on Chinese Short Text Matching (2403.19930v1)
Published 29 Mar 2024 in cs.CL
Abstract: The recent success of LLMs has garnered significant attention in both academia and industry. Prior research on LLMs has primarily focused on enhancing or leveraging their generalization capabilities in zero- and few-shot settings. However, there has been limited investigation into effectively fine-tuning LLMs for a specific natural language understanding task in supervised settings. In this study, we conduct an experimental analysis by fine-tuning LLMs for the task of Chinese short text matching. We explore various factors that influence performance when fine-tuning LLMs, including task modeling methods, prompt formats, and output formats.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- The bq corpus: A large-scale domain-specific chinese corpus for sentence semantic equivalence identification. In Proceedings of the 2018 conference on empirical methods in natural language processing, pages 4946–4951.
- Neural graph matching networks for chinese short text matching. In Proceedings of the 58th annual meeting of the Association for Computational Linguistics, pages 6152–6158.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332.
- Evaluating chatgpt’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness. arXiv preprint arXiv:2304.11633.
- Label supervised llama finetuning. arXiv preprint arXiv:2310.01208.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
- Lcqmc: A large-scale chinese question matching corpus. In Proceedings of the 27th international conference on computational linguistics, pages 1952–1962.
- Improved text matching by enhancing mutual information. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Improving few-shot performance of language models via nearest neighbor calibration. arXiv preprint arXiv:2212.02216.
- Foundations and trends® in information retrieval. Foundations and Trends® in Information Retrieval, 2(1-2):1–135.
- All information is valuable: Question matching over full information transmission network. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1431–1440.
- UDAPDR: Unsupervised domain adaptation via LLM prompting and distillation of rerankers. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11265–11279, Singapore. Association for Computational Linguistics.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205.
- Fine-tuned LLMs know more, hallucinate less with few-shot sequence-to-sequence semantic parsing over Wikidata. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5778–5791, Singapore. Association for Computational Linguistics.
- Rethinking benchmark and contamination for language models with rephrased samples. arXiv preprint arXiv:2311.04850.
- Appraising the potential uses and harms of LLMs for medical systematic reviews. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10122–10139, Singapore. Association for Computational Linguistics.
- Don’t make your llm an evaluation benchmark cheater. arXiv preprint arXiv:2311.01964.