LongForm: Effective Instruction Tuning with Reverse Instructions (2304.08460v3)

Published 17 Apr 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Instruction tuning enables LLMs to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive human annotation, crowd-sourced datasets with alignment issues, and generating noisy examples via LLMs. We introduce the LongForm-C dataset, which is created by reverse instructions. We generate instructions via LLMs for human-written corpus examples using reverse instructions. First we select a diverse set of human-written documents from corpora such as C4 and Wikipedia; then we generate instructions for these documents via LLMs. This approach provides a cheaper and cleaner instruction-tuning dataset with natural output and one suitable for long text generation. Our models outperform 10x larger LLMs without instruction tuning on tasks such as story/recipe generation and long-form question answering. Moreover, LongForm models outperform prior instruction-tuned models such as FLAN-T5 and Alpaca by a large margin, and improve language understanding capabilities further. We publicly release our data and models: https://github.com/akoksal/LongForm.

PDF Abstract

An Examination of "LongForm: Effective Instruction Tuning with Reverse Instructions"

The presented paper introduces an innovative strategy for instruction tuning in LLMs (LMs), specifically through the development of the LongForm-C dataset. This paper addresses the challenges of creating high-quality instruction datasets, which traditionally depend on costly human efforts and often result in limited or imprecise data suitable for instruction tuning. The authors propose a novel technique, referred to as "reverse instructions," that leverages existing human-written corpora to automatically generate high-quality instruction-output pairs. This approach uses LLMs to generate instructions for selected human-written texts through reverse engineering, thereby improving the instruction-following capabilities of the fine-tuned LMs.

Key Methodological Innovations

The proposed reverse instructions methodology is a significant methodological advancement designed to create diverse and efficient instruction tuning data. The process begins with the extraction of varied human-authored texts from large corpora like C4 and English Wikipedia. The reverse instructions approach then generates corresponding instructions for these extracted samples using LLMs through a zero-shot prompt, a technique designed to minimize cost and maximize quality. By juxtaposing these generated instructions with human-written text, the dataset effectively captures realistic outputs suitable for long text generation.

Strong Numerical Outcomes and Model Evaluation

The paper provides compelling numerical evidence illustrating the efficiency of its methods. The LMs finely tuned on the LongForm-C dataset surpass much larger models devoid of instruction tuning capabilities, specifically on tasks such as story generation, recipe creation, long-form question answering, and more. Particularly noteworthy are the results where LongForm models, such as LongForm-OPT-2.7B, demonstrate superior performance against robust baseline models like OPT-30B and the instruction-tuned competitors FLAN-T5 and Alpaca. Metrics used include METEOR scores, which highlight substantial performance improvements across text generation tasks. The models show enhanced ability in understanding and following multilingual instructions, marking significant progress in the multilingual context.

Implications and Future Prospects

The results in this paper underscore the potential impact of employing reverse instructions in creating instruction datasets. The efficiency and lowered cost barriers introduced by reverse instructions could democratize the fine-tuning process by enabling more researchers and practitioners to develop capable LMs. Additionally, the release of the LongForm-C dataset and models facilitates further research into instruction-tuned LMs and potentially offers a pathway to high-performing LLMs requiring lesser computational resources. Future developments could explore refining reverse instructions methodologies and extending such techniques to other languages and domains. Additionally, addressing hallucination tendencies and structured prediction shortcomings, as acknowledged in the limitations, may present fruitful directions for enhancing model reliability and applicability.

In summary, this paper contributes essential insights into instruction tuning, demonstrating that strategic data creation practices can substantially enhance the performance of LLMs. The reverse instructions method signifies a promising avenue towards optimizing resource allocation in LM training while maintaining or even enhancing model effectiveness in diverse application domains.