In-Context Demonstration Selection with Cross Entropy Difference (2305.14726v2)

Published 24 May 2023 in cs.CL and cs.AI

Abstract: LLMs can use in-context demonstrations to improve performance on zero-shot tasks. However, selecting the best in-context examples is challenging because model performance can vary widely depending on the selected examples. We present a cross-entropy difference (CED) method for selecting in-context demonstrations. Our method is based on the observation that the effectiveness of in-context demonstrations negatively correlates with the perplexity of the test example by a LLM that was finetuned on that demonstration. We utilize parameter efficient finetuning to train small models on training data that are used for computing the cross-entropy difference between a test example and every candidate in-context demonstration. This metric is used to rank and select in-context demonstrations independently for each test input. We evaluate our method on a mix-domain dataset that combines 8 benchmarks, representing 4 text generation tasks, showing that CED for in-context demonstration selection can improve performance for a variety of LLMs.

Authors (7)

Dan Iter (16 papers)
Reid Pryzant (17 papers)
Ruochen Xu (35 papers)
Shuohang Wang (69 papers)
Yang Liu (2253 papers)
Yichong Xu (42 papers)
Chenguang Zhu (100 papers)

Citations (9)

View on Semantic Scholar

Summary

In-Context Demonstration Selection with Cross Entropy Difference

The paper "In-Context Demonstration Selection with Cross Entropy Difference," authored by Dan Iter et al., presents a methodology aimed at enhancing the performance of LLMs through the strategic selection of in-context demonstrations (ICDs). This approach is particularly crucial as LLMs adapt to new tasks, especially when traditional finetuning methods may not be feasible due to constraints like limited data or computational resources.

Methodology Overview

The central contribution of this work is the introduction of a novel selection method employing Cross Entropy Difference (CED) to identify ICDs. The authors observe that the perplexity of a test example, significantly influenced by a model finetuned on a related demonstration, correlates negatively with the effectiveness of that demonstration when used in context. Parameter efficient finetuning (PEFT) is harnessed to train small models on individual training examples, allowing for the computation of CED between each test input and potential demonstrations.

Empirical Evaluation

The methodology is evaluated on a mixed-domain dataset encapsulating eight benchmarks across four text generation tasks: binary classification, multiple choice, extractive question answering, and abstractive question answering. The results demonstrate that the CED-based selection improves performance over baseline methods that rely on random selection or nearest neighbor strategies, particularly on models such as GPT-3.5.

Key Contributions

Cross Entropy Difference Methodology: By adapting CED, borrowed from domain adaptation literature, the authors offer a quantifiable metric for selecting in-context demonstrations, leveraging small model finetuning to efficiently approximate in-domain gradients.
Transferability Across Models: The findings suggest that the CED method is not only effective on compact models like T-Few (3B) but also significantly boosts performance on much larger LLMs, such as various sizes of GPT-3.
Insights into Demonstration Selection: The paper provides theoretical insights into the efficacy of CED, positing that its alignment with gradient similarities serves as an effective heuristic for demonstration selection.
Scalability Techniques: For larger datasets, the authors employ clustering to reduce computational overhead while maintaining selection efficacy, suggesting practicality in real-world scenarios.

Implications and Future Directions

This research holds both theoretical and practical implications in the field of AI and NLP. Theoretically, it extends the understanding of in-context learning by framing ICD selection as a gradient alignment problem. Practically, it offers a viable approach for dynamically improving the adaptability and performance of LLMs in few-shot and zero-shot contexts without extensive finetuning overheads.

Future research may explore integrating this selection methodology with a broader range of LLM architectures, particularly open-source variants like LLaMa, to assess comparative effectiveness. Additionally, investigating the potential of integrating finetuning phases or leveraging LLM activations as selection signals could further optimize ICD selection, thereby enhancing the flexibility and robustness of LLMs across diverse applications.

Overall, the authors provide a compelling contribution to the growing toolkit for optimizing LLM performance in resource-efficient ways, embodying a nuanced understanding of the interplay between finetuned models and the selection of in-context demonstrations.

PDF Markdown

Related Papers

GitHub

GitHub - microsoft/LMOps: General technology for enabling AI capabilities w/ LLMs and MLLMs (3,263 stars)