Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance (2305.13225v2)
Abstract: Proprietary LLMs, such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks. Recent studies demonstrate that open-sourced smaller foundational models, such as 7B-size LLaMA, can also display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data. In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following, and explore whether LLMs can be beneficial and further improved for such targeted scenarios. We choose the writing-assistant scenario as the testbed, which includes seven writing tasks. We collect training data for these tasks, reframe them in an instruction-following format, and subsequently refine the LLM, specifically LLaMA, via instruction tuning. Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks. We also conduct more experiments and analyses to offer insights for future work on effectively fine-tuning LLaMA for specific scenarios. Finally, we initiate a discussion regarding the necessity of employing LLMs for only one targeted task, taking into account the efforts required for tuning and the resources consumed during deployment.
- Asset: A dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. In Proceedings of ACL, pages 4668–4679.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
- Language models are few-shot learners. Proceedings of NIPS, 33:1877–1901.
- The BEA-2019 shared task on grammatical error correction. In Proceedings of BEA@ACL, pages 52–75.
- Automatic annotation and evaluation of error types for grammatical error correction. In Proceedings of ACL, pages 793–805.
- Grammatical error correction: A survey of the state of the art. arXiv preprint arXiv:2211.05166.
- Semeval-2017 task 1: Semantic textual similarity multilingual and cross-lingual focused evaluation. In The 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Daniel Dahlmeier and Hwee Tou Ng. 2012. Better evaluation for grammatical error correction. In Proceedings of NAACL-HLT, pages 568–572.
- Understanding iterative revision from human-written text. In Proceedings of ACL, pages 3573–3590.
- EditEval: An Instruction-Based Benchmark for Text Improvements. arXiv preprint arXiv:2209.13331.
- Is chatgpt a highly fluent grammatical error correction system? a comprehensive evaluation. arXiv preprint arXiv:2304.01746.
- Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Parrot: Translating during chat using large language models. arXiv preprint arXiv:2304.02426.
- Is chatgpt a good translator? a preliminary study. arXiv preprint arXiv:2301.08745.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
- David Kauchak. 2013. Improving text simplification language modeling using unsimplified text data. In Proceedings of ACL, pages 1537–1546.
- Improving iterative text revision by learning where to edit from other revision tasks. In Proceedings of EMNLP, pages 9986–9999.
- A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability. arXiv preprint arXiv:2303.13547.
- RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
- Exploring effectiveness of gpt-3 in grammatical error correction: A study on performance and controllability in prompt-based methods. arXiv preprint arXiv:2305.18156.
- Ground truth for grammatical error correction metrics. In Proceedings of ACL (Short), pages 588–593.
- JFLEG: a fluency corpus and benchmark for grammatical error correction. In Proceedings of EACL, pages 229–234.
- The CoNLL-2014 shared task on grammatical error correction. In Proceedings of CoNLL: Shared Task, pages 1–14.
- GECToR–grammatical error correction: tag, not rewrite. In Proceedings of BEA@ACL, pages 163–170.
- OpenAI. 2023a. ChatGPT. https://openai.com/blog/chatgpt.
- OpenAI. 2023b. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Training language models to follow instructions with human feedback. Proceedings of NIPS, 35:27730–27744.
- A preliminary evaluation of chatgpt for zero-shot dialogue understanding. arXiv preprint arXiv:2304.04256.
- The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
- Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813.
- Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- Automatically neutralizing subjective bias in text. In Proceedings of AAAI, pages 480–489.
- Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
- Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, 21(1):5485–5551.
- A simple recipe for multilingual grammatical error correction. In Proceedings of ACL-IJCNLP, pages 702–707.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Effidit: Your ai writing assistant. arXiv preprint arXiv:2208.01815.
- Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617.
- Felix Stahlberg and Shankar Kumar. 2021. Synthetic data generation for grammatical error correction with tagged corruption models. In Proceedings of BEA@EACL, pages 37–47.
- Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Huatuo: Tuning llama model with Chinese medical knowledge. arXiv preprint arXiv:2304.06975.
- InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction. arXiv preprint arXiv:2304.08085.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
- Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205.
- Transformers: State-of-the-art natural language processing. In Proceedings of EMNLP (Demo), pages 38–45.
- Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
- Optimizing statistical machine translation for text simplification. TACL, 4:401–415.
- Zero-shot temporal relation extraction with chatgpt. arXiv preprint arXiv:2304.05454.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
- SynGEC: Syntax-enhanced grammatical error correction with a tailored gec-oriented parser. In Proceedings of EMNLP, pages 2518–2531.
- Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
- Yue Zhang (618 papers)
- Leyang Cui (50 papers)
- Deng Cai (181 papers)
- Xinting Huang (36 papers)
- Tao Fang (19 papers)
- Wei Bi (62 papers)