Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following (2302.14691v2)

Published 28 Feb 2023 in cs.CL and cs.AI

Abstract: In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various LLMs during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average, respectively. This implies that the instruction-following ability of LLMs can be improved during inference time with a fixed prompt constructed with simple heuristics. We hypothesize that TAPP assists LLMs to better estimate the output distribution by focusing more on the instruction of the target task during inference. In other words, such ability does not seem to be sufficiently activated in not only base LLMs but also many instruction-fine-tuned LLMs. All experiments are reproducible from https://github.com/seonghyeonye/TAPP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. What learning algorithm is in-context learning? investigations with linear models. arXiv preprint arXiv:2211.15661.
  2. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861.
  3. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  4. Gpt-neox-20b: An open-source autoregressive language model. arXiv preprint arXiv:2204.06745.
  5. Language models are few-shot learners. Advances in neural information processing systems.
  6. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  7. Scaling Instruction-Finetuned Language Models. arXiv preprint arXiv:2210.11416.
  8. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta Optimizers. arXiv preprint arXiv:2212.10559.
  9. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
  10. What can transformers learn in-context? a case study of simple function classes. arXiv preprint arXiv:2208.01066.
  11. Studying Large Language Model Generalization with Influence Functions. arXiv preprint arXiv:2308.03296.
  12. news-please: A Generic News Crawler and Extractor. In Proceedings of the 15th International Symposium of Information Science, 218–223.
  13. Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor. arXiv preprint arXiv:2212.09689.
  14. OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization. arXiv preprint arXiv:2212.12017.
  15. Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla. arXiv preprint arXiv:2307.09458.
  16. What Makes Good In-Context Examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics.
  17. GPT understands, too. arXiv preprint arXiv:2103.10385.
  18. Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations. arXiv preprint arXiv:2212.09865.
  19. Text and patterns: For effective chain of thought, it takes two to tango. arXiv preprint arXiv:2209.07686.
  20. MetaICL: Learning to Learn In Context. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
  21. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? arXiv preprint arXiv:2202.12837.
  22. OpenAI. 2022. Chatgpt: Optimizing language models for dialogue.
  23. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
  24. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  25. Learning To Retrieve Prompts for In-Context Learning. arXiv preprint arXiv:2112.08633.
  26. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  27. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, 31210–31227. PMLR.
  28. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971.
  29. Transformers learn in-context by gradient descent. arXiv preprint arXiv:2212.07677.
  30. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  31. Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. arXiv preprint arXiv:2212.10001.
  32. Self-Instruct: Aligning Language Model with Self Generated Instructions. arXiv preprint arXiv:2212.10560.
  33. Benchmarking generalization via in-context instructions on 1,600+ language tasks. arXiv preprint arXiv:2204.07705.
  34. Do Prompt-Based Models Really Understand the Meaning of their Prompts? arXiv preprint arXiv:2109.01247.
  35. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  36. INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback. arXiv:2305.14282.
  37. Guess the Instruction! Making Language Models Stronger Zero-Shot Learners. arXiv preprint arXiv:2210.02969.
  38. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
Citations (31)

Summary

  • The paper demonstrates that using fixed, task-agnostic prefix prompts improves LLM performance by 34.58% for base models and 12.26% for instruction-tuned models.
  • It outlines a methodology featuring classification-task demonstrations to effectively align instructions with output distributions during inference.
  • The findings suggest that TAPP offers scalable, real-time enhancements, enabling smaller models to outperform larger ones like GPT-3 Davinci.

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

The paper examines the role and efficacy of Task-Agnostic Prefix Prompt (TAPP) in enhancing the instruction-following capacity of LLMs during inference. A TAPP is defined as a fixed prompt that is prepended to the input of an LLM irrespective of the specific target task, facilitating zero-shot task generalization. This deviates from the conventional approach of tailoring prompts to specific tasks, aiming instead to enhance the LLM’s performance across diverse tasks without requiring task-specific tuning.

Key Findings

The paper identifies that both base LLMs and models fine-tuned for instruction following demonstrate substantial performance improvements when employing TAPP. Specifically, base LLMs exhibited an average improvement of 34.58%, while instruction-tuned models showed a 12.26% enhancement. This indicates that TAPP provides a significant performance boost during inference, possibly by assisting LLMs to better estimate the output distribution by focusing more effectively on task instructions.

The research also highlights the complementary nature of TAPP with instruction fine-tuning, showing that not only can TAPP improve the baseline performance of LLMs, but it also adds value to already instruction-tuned models. Notably, the 6B-sized GPT-J model with TAPP outperformed the 175B-sized GPT-3 Davinci, showcasing the utility of TAPP in circumventing more resource-intensive instruction tuning processes.

Demonstration Composition and Strategies

The paper elucidates the methodology for constructing TAPP through a series of heuristics:

  • Demonstrations consist of classification tasks with explicit answer choices mentioned in the instruction.
  • Overlapping of answer choices in demonstrations is avoided to prevent output copying by LLMs.
  • Demonstration length is restricted to manage the input size and maintain computational efficiency during inference.

Through ablations, the paper demonstrates that constructing demonstrations from classification tasks is critical even when evaluating generation tasks. It is posited that these tasks provide explicit output clues, thereby facilitating LLMs in better correlating task instructions to their responses. Furthermore, the authors experimented with demonstrations created by ChatGPT and found the machine-generated prompts comparable in performance to human-crafted demonstration sets.

Implications and Future Directions

The findings indicate TAPP as a promising approach for real-time and proprietary models which cannot be fine-tuned by third parties due to restrictions on access to model weights. Importantly, the paper proposes that TAPP assists LLMs in concentrating on the task directives to predict output distributions more effectively.

The results suggest potential areas for future exploration, such as:

  • Investigating the inner workings of LLMs when utilizing TAPP to provide deeper insights into the architectural changes induced during inference.
  • Exploring the efficacy of TAPP across diverse instruction-fine-tuned models, thereby understanding its broader applicability across various LLM architectures.
  • Developing more nuanced interpretations of the correspondence between prompt content and model response patterns.

In conclusion, by reducing the computational load required for extensive instruction fine-tuning, TAPP offers a scalable and flexible method for enhancing LLM performance across a wide array of tasks. This underscores the utility of exploring task-agnostic methods which can offer scalable solutions within the evolving landscape of artificial intelligence.