Contextual Max-Value Expected Information Gain
- CMV-EIG is a method for selecting informative examples in few-shot in-context learning using entropy minimization to quantify example utility.
- It employs content-free calibration to counteract template-induced bias, achieving 12%-19% accuracy improvements across multiple classification benchmarks.
- The approach integrates active learning principles with black-box LLM evaluations, making it applicable to diverse models and prompt engineering scenarios.
Contextual Max-Value Expected Information Gain (CMV-EIG) refers to a principled criterion for selecting informative examples in few-shot in-context learning (ICL) with LLMs. CMV-EIG quantifies the informativeness of candidate demonstration examples by estimating their effect on reducing predictive uncertainty, employing an entropy-minimization approach, and explicitly mitigating adverse effects of template-induced bias through calibration procedures. The approach introduces robust selection mechanisms for constructing few-shot prompts, yielding significant improvements in ICL performance across diverse classification benchmarks (Liu et al., 2023).
1. Formal Definition and Min-Entropy Reduction
Expected Information Gain (EIG) in ICL measures, for a candidate input , the expected reduction in model uncertainty about the output labels given a context . In the paradigm instance considered, comprises only an empty or templated prompt . The general EIG expression is
where denotes Shannon entropy. In practice, is context-constant, and the expectation over unknown true labels is intractable for black-box LLMs. Employing black-box access yields an operational “min-entropy” criterion where utility is given by , with the model’s zero-shot output conditional on 0 and 1. Thus, the practical utility of an example 2 is defined as
3
The top-4 examples minimizing conditional entropy 5 are selected for prompt assembly (Liu et al., 2023).
2. Contextual Max-Value EIG Sampling Criterion
The CMV-EIG criterion formalizes informativeness-driven selection via: 6 where 7 denotes the candidate pool. The reduction from “expected” to “observed” entropy is justified by invariance of the baseline entropy term and infeasibility of true label marginals when only black-box evaluations are possible. The information theoretic interpretation aligns with principles of active learning, though the candidate pool and selection dynamics are uniquely adapted to few-shot prompt engineering.
3. Template Bias and Calibration Before Sampling
Raw application of the min-entropy criterion is sensitive to template bias: non-uniform prior distributions 8 arising from the prompt template alone. This bias causes certain candidate examples to “correct” for systemic over-prediction of specific labels, yielding low entropy but trivial informativeness (e.g., contentless samples appearing more informative than they are). Empirically, even empty templates induce pronounced skew.
To remove this bias, content-free calibration (CBS) is employed (Liu et al., 2023). The procedure consists of:
- Content-free prompt pool: Assemble 9.
- Template bias vector computation: Average zero-shot outputs on content-free strings,
0
- Vector-scale calibration: Given model output 1, scale by 2 and apply softmax:
3
- Calibrated entropy criterion: Compute 4, and select examples minimizing 5.
This adjustment ensures selection uniquely reflects example informativeness rather than template-induced label preferences.
4. Algorithmic Implementation
The full CMV-EIG with calibration procedure is as follows:
3
Practical recommendations include drawing 6 candidate samples, use of three content-free calibration strings, default greedy decoding (temperature 7), and K-shot prompt construction with true gold labels for downstream evaluation (Liu et al., 2023).
5. Experimental Context and Benchmarking
Evaluations were conducted on six classification tasks: SST-2, AGNews, TREC, CB, RTE, and DBPedia. Three LLMs were considered (GPT-2 XL, GPT-J, GPT-3 davinci) with 8 randomly subsampled train candidates per task. Each LLM was assessed over five random seeds (two for GPT-3), and evaluated on 300 test samples per task.
Comparative baselines included:
- Random selection,
- MaxEntropy (highest raw 9),
- MaxIG (min raw 0, no calibration),
- CBS MaxIG (calibrated information gain).
Empirical results demonstrate that CBS MaxIG achieves a 1–2\% relative accuracy gain on average over random selection and consistently outperforms both MaxEntropy and uncalibrated MaxIG. This confirms IG as a robust informativeness proxy in ICL and indicates the necessity of pre-sampling calibration to neutralize template bias (Liu et al., 2023).
6. Significance and Related Methodologies
The CMV-EIG approach generalizes conventional entropy-based active learning techniques for use with LLM in-context learning via black-box access. By centering entropy minimization and incorporating a formal calibration step, CMV-EIG addresses both the variance induced by demonstration selection and systemic pitfalls of template and prompt-based evaluation. The technique is directly extensible to other LLMs and is robust under settings where gold labels are not initially accessible for all candidates—a scenario typical in prompt selection for ICL. A plausible implication is that further improvements in in-context demonstration selection may be achievable by refining calibration strategies or entropy estimation techniques.