Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following (2302.14691v2)
Abstract: In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various LLMs during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average, respectively. This implies that the instruction-following ability of LLMs can be improved during inference time with a fixed prompt constructed with simple heuristics. We hypothesize that TAPP assists LLMs to better estimate the output distribution by focusing more on the instruction of the target task during inference. In other words, such ability does not seem to be sufficiently activated in not only base LLMs but also many instruction-fine-tuned LLMs. All experiments are reproducible from https://github.com/seonghyeonye/TAPP.
- What learning algorithm is in-context learning? investigations with linear models. arXiv preprint arXiv:2211.15661.
- A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
- Gpt-neox-20b: An open-source autoregressive language model. arXiv preprint arXiv:2204.06745.
- Language models are few-shot learners. Advances in neural information processing systems.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Scaling Instruction-Finetuned Language Models. arXiv preprint arXiv:2210.11416.
- Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta Optimizers. arXiv preprint arXiv:2212.10559.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
- What can transformers learn in-context? a case study of simple function classes. arXiv preprint arXiv:2208.01066.
- Studying Large Language Model Generalization with Influence Functions. arXiv preprint arXiv:2308.03296.
- news-please: A Generic News Crawler and Extractor. In Proceedings of the 15th International Symposium of Information Science, 218–223.
- Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor. arXiv preprint arXiv:2212.09689.
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization. arXiv preprint arXiv:2212.12017.
- Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla. arXiv preprint arXiv:2307.09458.
- What Makes Good In-Context Examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics.
- GPT understands, too. arXiv preprint arXiv:2103.10385.
- Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations. arXiv preprint arXiv:2212.09865.
- Text and patterns: For effective chain of thought, it takes two to tango. arXiv preprint arXiv:2209.07686.
- MetaICL: Learning to Learn In Context. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
- Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? arXiv preprint arXiv:2202.12837.
- OpenAI. 2022. Chatgpt: Optimizing language models for dialogue.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
- Learning To Retrieve Prompts for In-Context Learning. arXiv preprint arXiv:2112.08633.
- Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
- Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, 31210–31227. PMLR.
- LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971.
- Transformers learn in-context by gradient descent. arXiv preprint arXiv:2212.07677.
- GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
- Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. arXiv preprint arXiv:2212.10001.
- Self-Instruct: Aligning Language Model with Self Generated Instructions. arXiv preprint arXiv:2212.10560.
- Benchmarking generalization via in-context instructions on 1,600+ language tasks. arXiv preprint arXiv:2204.07705.
- Do Prompt-Based Models Really Understand the Meaning of their Prompts? arXiv preprint arXiv:2109.01247.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback. arXiv:2305.14282.
- Guess the Instruction! Making Language Models Stronger Zero-Shot Learners. arXiv preprint arXiv:2210.02969.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.