Towards Informative Few-Shot Prompt with Maximum Information Gain for In-Context Learning (2310.08923v1)

Published 13 Oct 2023 in cs.CL

Abstract: LLMs possess the capability to engage In-context Learning (ICL) by leveraging a few demonstrations pertaining to a new downstream task as conditions. However, this particular learning paradigm suffers from high instability stemming from substantial variances induced by factors such as the input distribution of selected examples, their ordering, and prompt formats. In this work, we demonstrate that even when all these factors are held constant, the random selection of examples still results in high variance. Consequently, we aim to explore the informative ability of data examples by quantifying the Information Gain (IG) obtained in prediction after observing a given example candidate. Then we propose to sample those with maximum IG. Additionally, we identify the presence of template bias, which can lead to unfair evaluations of IG during the sampling process. To mitigate this bias, we introduce Calibration Before Sampling strategy. The experimental results illustrate that our proposed method can yield an average relative improvement of 14.3% across six classification tasks using three LLMs.

References (35)

Citations (8)

View on Semantic Scholar

Summary

Enhancing Few-Shot Prompting with Information Gain Maximization

This paper addresses the instability of In-Context Learning (ICL) in LLMs arising from variances induced by factors such as input distribution, demonstration ordering, and prompt formats. The authors demonstrate that even when these factors are controlled, the random selection of examples still leads to high variance in performance. To mitigate this, the paper introduces a novel method to quantify the informative ability of data examples by measuring the Information Gain (IG) obtained in prediction after observing a candidate example and proposes sampling examples with maximum IG. The authors also identify and address the presence of template bias through a Calibration Before Sampling strategy. Experimental results on six classification tasks using three LLMs show an average relative improvement of 14.3\%.

Problem Formulation and Information Gain

The paper focuses on retrieving prompts from an unlabeled text dataset $\mathcal{D}_{unlab} =\{x_i\}_{i=1}^N$ for a specific task, aligning with true few-shot learning. The approach involves using a pre-trained LLM to predict all candidate examples in $\mathcal{D}_{unlab}$ , resulting in a prediction set $\mathcal{Y}=\{\mathbf{y}_i\}_{i=1}^N$ , where $\mathbf{y}_i$ represents the normalized predicted label distribution given input $x_i$ . The objective is to select a subset $\{x_j\}_{j=1}^K$ from $\mathcal{D}_{unlab}$ , where $K \ll N$ , to facilitate $K$ -shot learning.

The core concept is to measure the informative ability of data examples by quantifying the Information Gain (IG) of prediction. IG is defined as the information obtained in the predicted label distribution $Y$ when observing one example candidate $X=x_{ob}$ in $\mathcal{D}_{unlab}$ :

$IG(Y, x_{ob}) = H(Y) - H(Y|x_{ob})$

where $H(Y)$ is the information entropy of $Y$ and $H(Y|x_{ob})$ is the conditional entropy of $Y$ given the observation $x_{ob}$ . Since $H(Y)$ remains constant for a given task, the problem is reframed as selecting examples with minimum conditional entropy $H(Y|x_{ob})$ .

Figure 1: An overview of the proposed method, detailing the sampling time and test time processes.

Template Bias and Calibration Before Sampling

The paper identifies the presence of template bias, where the LLM exhibits a tendency to favor specific answers based on the template alone, even in the absence of demonstrations. This bias can lead to unfair evaluations of IG during the sampling process. To address this, the authors introduce a Calibration Before Sampling (CBS) strategy. This involves using a normalization function $\sigma$ , a weight matrix $\mathbf{W}$