Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance (2305.13225v2)

Published 22 May 2023 in cs.CL

Abstract: Proprietary LLMs, such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks. Recent studies demonstrate that open-sourced smaller foundational models, such as 7B-size LLaMA, can also display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data. In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following, and explore whether LLMs can be beneficial and further improved for such targeted scenarios. We choose the writing-assistant scenario as the testbed, which includes seven writing tasks. We collect training data for these tasks, reframe them in an instruction-following format, and subsequently refine the LLM, specifically LLaMA, via instruction tuning. Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks. We also conduct more experiments and analyses to offer insights for future work on effectively fine-tuning LLaMA for specific scenarios. Finally, we initiate a discussion regarding the necessity of employing LLMs for only one targeted task, taking into account the efforts required for tuning and the resources consumed during deployment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Asset: A dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. In Proceedings of ACL, pages 4668–4679.
  2. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
  3. Language models are few-shot learners. Proceedings of NIPS, 33:1877–1901.
  4. The BEA-2019 shared task on grammatical error correction. In Proceedings of BEA@ACL, pages 52–75.
  5. Automatic annotation and evaluation of error types for grammatical error correction. In Proceedings of ACL, pages 793–805.
  6. Grammatical error correction: A survey of the state of the art. arXiv preprint arXiv:2211.05166.
  7. Semeval-2017 task 1: Semantic textual similarity multilingual and cross-lingual focused evaluation. In The 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14.
  8. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  9. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  10. Daniel Dahlmeier and Hwee Tou Ng. 2012. Better evaluation for grammatical error correction. In Proceedings of NAACL-HLT, pages 568–572.
  11. Understanding iterative revision from human-written text. In Proceedings of ACL, pages 3573–3590.
  12. EditEval: An Instruction-Based Benchmark for Text Improvements. arXiv preprint arXiv:2209.13331.
  13. Is chatgpt a highly fluent grammatical error correction system? a comprehensive evaluation. arXiv preprint arXiv:2304.01746.
  14. Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  16. Parrot: Translating during chat using large language models. arXiv preprint arXiv:2304.02426.
  17. Is chatgpt a good translator? a preliminary study. arXiv preprint arXiv:2301.08745.
  18. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
  19. David Kauchak. 2013. Improving text simplification language modeling using unsimplified text data. In Proceedings of ACL, pages 1537–1546.
  20. Improving iterative text revision by learning where to edit from other revision tasks. In Proceedings of EMNLP, pages 9986–9999.
  21. A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability. arXiv preprint arXiv:2303.13547.
  22. RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
  23. Exploring effectiveness of gpt-3 in grammatical error correction: A study on performance and controllability in prompt-based methods. arXiv preprint arXiv:2305.18156.
  24. Ground truth for grammatical error correction metrics. In Proceedings of ACL (Short), pages 588–593.
  25. JFLEG: a fluency corpus and benchmark for grammatical error correction. In Proceedings of EACL, pages 229–234.
  26. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of CoNLL: Shared Task, pages 1–14.
  27. GECToR–grammatical error correction: tag, not rewrite. In Proceedings of BEA@ACL, pages 163–170.
  28. OpenAI. 2023a. ChatGPT. https://openai.com/blog/chatgpt.
  29. OpenAI. 2023b. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  30. Training language models to follow instructions with human feedback. Proceedings of NIPS, 35:27730–27744.
  31. A preliminary evaluation of chatgpt for zero-shot dialogue understanding. arXiv preprint arXiv:2304.04256.
  32. The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
  33. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813.
  34. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
  35. Automatically neutralizing subjective bias in text. In Proceedings of AAAI, pages 480–489.
  36. Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
  37. Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, 21(1):5485–5551.
  38. A simple recipe for multilingual grammatical error correction. In Proceedings of ACL-IJCNLP, pages 702–707.
  39. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  40. Effidit: Your ai writing assistant. arXiv preprint arXiv:2208.01815.
  41. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617.
  42. Felix Stahlberg and Shankar Kumar. 2021. Synthetic data generation for grammatical error correction with tagged corruption models. In Proceedings of BEA@EACL, pages 37–47.
  43. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
  44. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  45. Huatuo: Tuning llama model with Chinese medical knowledge. arXiv preprint arXiv:2304.06975.
  46. InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction. arXiv preprint arXiv:2304.08085.
  47. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  48. Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205.
  49. Transformers: State-of-the-art natural language processing. In Proceedings of EMNLP (Demo), pages 38–45.
  50. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
  51. Optimizing statistical machine translation for text simplification. TACL, 4:401–415.
  52. Zero-shot temporal relation extraction with chatgpt. arXiv preprint arXiv:2304.05454.
  53. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  54. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
  55. SynGEC: Syntax-enhanced grammatical error correction with a tailored gec-oriented parser. In Proceedings of EMNLP, pages 2518–2531.
  56. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yue Zhang (618 papers)
  2. Leyang Cui (50 papers)
  3. Deng Cai (181 papers)
  4. Xinting Huang (36 papers)
  5. Tao Fang (19 papers)
  6. Wei Bi (62 papers)
Citations (32)