Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars (2405.16122v2)
Abstract: LLMs have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar selection method. Recent studies have explored retrieval-based approaches to select exemplars tailored to individual test queries, which can be undesirable due to extra test-time computation and an increased risk of data exposure. Moreover, existing methods fail to adequately account for the impact of exemplar ordering on the performance. On the other hand, the impact of the instruction, another essential component in the prompt given to the LLM, is often overlooked in existing exemplar selection methods. To address these challenges, we propose a novel method named EASE, which leverages the hidden embedding from a pre-trained LLM to represent ordered sets of exemplars and uses a neural bandit algorithm to optimize the sets of exemplars while accounting for exemplar ordering. Our EASE can efficiently find an ordered set of exemplars that performs well for all test queries from a given task, thereby eliminating test-time computation. Importantly, EASE can be readily extended to jointly optimize both the exemplars and the instruction. Through extensive empirical evaluations (including novel tasks), we demonstrate the superiority of EASE over existing methods, and reveal practical insights about the impact of exemplar selection on ICL, which may be of independent interest. Our code is available at https://github.com/ZhaoxuanWu/EASE-Prompt-Optimization.
- A survey on data selection for language models. arXiv:2402.16827, 2024.
- Language models are few-shot learners. In Proc. NeurIPS, pages 1877–1901, 2020.
- Data curation alone can stabilize in-context learning. In Proc. ACL, pages 8123–8144, 2023.
- InstructZero: Efficient instruction optimization for black-box large language models. arXiv:2306.03082, 2023a.
- How is ChatGPT’s behavior changing over time? arXiv:2307.09009, 2023b.
- Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv:2309.16797, 2023.
- Ambiguity-aware in-context learning with large language models. arXiv:2309.07900, 2024.
- Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. In Proc. ICLR, 2024.
- Coverage-based example selection for in-context learning. In Proc. EMNLP, pages 13924–13950, 2023.
- Localized zeroth-order prompt optimization. arXiv:2403.02993, 2024.
- Diverse demonstrations improve in-context compositional generalization. In Proc. ACL, pages 1401–1422, 2023.
- Finding support examples for in-context learning. In Proc. EMNLP, pages 6219–6235, 2023.
- Unified demonstration retriever for in-context learning. In Proc. ACL, pages 4644–4668, 2023.
- Use your INSTINCT: Instruction optimization using neural bandits coupled with transformers. In NeurIPS Workshop on Instruction Tuning and Instruction Following, 2023.
- What makes good in-context examples for GPT-3? In Proc. DeeLIO: Deep Learning Inside Out, pages 100–114, 2022.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proc. ACL, pages 8086–8098, 2022.
- In-context learning with retrieved demonstrations for language models: A survey. arXiv:2401.11624, 2024.
- The AI index 2024 annual report. Technical report, AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, 2024.
- Rethinking the role of demonstrations: What makes in-context learning work? In Proc. EMNLP, pages 11048–11064, 2022.
- Large language models: A survey. arXiv:2402.06196, 2024.
- Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation. In Proc. ACL Findings, pages 12284–12314, 2023.
- Privacy issues in large language models: A survey. arXiv:2312.06717, 2024.
- In-context example selection with influences. arXiv:2302.11042, 2023.
- GPT-4 technical report. arXiv:2303.08774, 2024.
- True few-shot learning with language models. In Proc. NeurIPS, pages 11054–11070, 2021.
- Learning transferable visual models from natural language supervision. In Proc. ICML, pages 8748–8763, 2021.
- Sentence-BERT: Sentence embeddings using siamese bert-networks. In Proc. EMNLP, pages 3982–3992, 2019.
- The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3(4):333–389, 2009.
- Learning to retrieve prompts for in-context learning. In Proc. NAACL, pages 2655–2671, 2022.
- Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Annals of Statistics, 41(5):2263–2291, 2013.
- MPNet: masked and permuted pre-training for language understanding. In Proc. NeurIPS, pages 16857–16867, 2020.
- Gemini: A family of highly capable multimodal models. arXiv:2312.11805, 2024.
- Cédric Villani. Topics in optimal transportation, volume 58. American Mathematical Soc., 2021.
- MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In Proc. NeurIPS, pages 5776–5788, 2020.
- Large language models are latent variable models: Explaining and finding good demonstrations for in-context learning. In Proc. NeurIPS, pages 15614–15638, 2023.
- Jeremy White. How strangers got my email address from ChatGPT’s model. The New York Times, 2023. URL https://www.nytimes.com/interactive/2023/12/22/technology/openai-chatgpt-privacy-exploit.html.
- Large language models as optimizers. In Proc. ICLR, 2024.
- Compositional exemplars for in-context learning. In Proc. ICML, pages 39818–39833, 2023.
- Active example selection for in-context learning. In Proc. EMNLP, pages 9134–9148, 2022.
- A survey of large language models. arXiv:2303.18223, 2023.
- Calibrate before use: Improving few-shot performance of language models. In Proc. ICML, pages 12697–12706, 2021.
- Neural contextual bandits with UCB-based exploration. In Proc. ICML, pages 11492–11502, 2020.
- Large language models are human-level prompt engineers. In Proc. ICLR, 2023.