Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following (2302.14691v2)

Published 28 Feb 2023 in cs.CL and cs.AI

Abstract: In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various LLMs during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average, respectively. This implies that the instruction-following ability of LLMs can be improved during inference time with a fixed prompt constructed with simple heuristics. We hypothesize that TAPP assists LLMs to better estimate the output distribution by focusing more on the instruction of the target task during inference. In other words, such ability does not seem to be sufficiently activated in not only base LLMs but also many instruction-fine-tuned LLMs. All experiments are reproducible from https://github.com/seonghyeonye/TAPP.

References (38)

Citations (31)

View on Semantic Scholar

Summary

The paper demonstrates that using fixed, task-agnostic prefix prompts improves LLM performance by 34.58% for base models and 12.26% for instruction-tuned models.
It outlines a methodology featuring classification-task demonstrations to effectively align instructions with output distributions during inference.
The findings suggest that TAPP offers scalable, real-time enhancements, enabling smaller models to outperform larger ones like GPT-3 Davinci.

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

The paper examines the role and efficacy of Task-Agnostic Prefix Prompt (TAPP) in enhancing the instruction-following capacity of LLMs during inference. A TAPP is defined as a fixed prompt that is prepended to the input of an LLM irrespective of the specific target task, facilitating zero-shot task generalization. This deviates from the conventional approach of tailoring prompts to specific tasks, aiming instead to enhance the LLM’s performance across diverse tasks without requiring task-specific tuning.

Key Findings

The paper identifies that both base LLMs and models fine-tuned for instruction following demonstrate substantial performance improvements when employing TAPP. Specifically, base LLMs exhibited an average improvement of 34.58%, while instruction-tuned models showed a 12.26% enhancement. This indicates that TAPP provides a significant performance boost during inference, possibly by assisting LLMs to better estimate the output distribution by focusing more effectively on task instructions.

The research also highlights the complementary nature of TAPP with instruction fine-tuning, showing that not only can TAPP improve the baseline performance of LLMs, but it also adds value to already instruction-tuned models. Notably, the 6B-sized GPT-J model with TAPP outperformed the 175B-sized GPT-3 Davinci, showcasing the utility of TAPP in circumventing more resource-intensive instruction tuning processes.

Demonstration Composition and Strategies

The paper elucidates the methodology for constructing TAPP through a series of heuristics:

Demonstrations consist of classification tasks with explicit answer choices mentioned in the instruction.
Overlapping of answer choices in demonstrations is avoided to prevent output copying by LLMs.
Demonstration length is restricted to manage the input size and maintain computational efficiency during inference.

Through ablations, the paper demonstrates that constructing demonstrations from classification tasks is critical even when evaluating generation tasks. It is posited that these tasks provide explicit output clues, thereby facilitating LLMs in better correlating task instructions to their responses. Furthermore, the authors experimented with demonstrations created by ChatGPT and found the machine-generated prompts comparable in performance to human-crafted demonstration sets.

Implications and Future Directions

The findings indicate TAPP as a promising approach for real-time and proprietary models which cannot be fine-tuned by third parties due to restrictions on access to model weights. Importantly, the paper proposes that TAPP assists LLMs in concentrating on the task directives to predict output distributions more effectively.

The results suggest potential areas for future exploration, such as:

Investigating the inner workings of LLMs when utilizing TAPP to provide deeper insights into the architectural changes induced during inference.
Exploring the efficacy of TAPP across diverse instruction-fine-tuned models, thereby understanding its broader applicability across various LLM architectures.
Developing more nuanced interpretations of the correspondence between prompt content and model response patterns.

In conclusion, by reducing the computational load required for extensive instruction fine-tuning, TAPP offers a scalable and flexible method for enhancing LLM performance across a wide array of tasks. This underscores the utility of exploring task-agnostic methods which can offer scalable solutions within the evolving landscape of artificial intelligence.

PDF Markdown

Related Papers

GitHub

GitHub - seonghyeonye/TAPP: [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following (79 stars)