Papers
Topics
Authors
Recent
Search
2000 character limit reached

Contextual Max-Value Expected Information Gain

Updated 22 February 2026
  • CMV-EIG is a method for selecting informative examples in few-shot in-context learning using entropy minimization to quantify example utility.
  • It employs content-free calibration to counteract template-induced bias, achieving 12%-19% accuracy improvements across multiple classification benchmarks.
  • The approach integrates active learning principles with black-box LLM evaluations, making it applicable to diverse models and prompt engineering scenarios.

Contextual Max-Value Expected Information Gain (CMV-EIG) refers to a principled criterion for selecting informative examples in few-shot in-context learning (ICL) with LLMs. CMV-EIG quantifies the informativeness of candidate demonstration examples by estimating their effect on reducing predictive uncertainty, employing an entropy-minimization approach, and explicitly mitigating adverse effects of template-induced bias through calibration procedures. The approach introduces robust selection mechanisms for constructing few-shot prompts, yielding significant improvements in ICL performance across diverse classification benchmarks (Liu et al., 2023).

1. Formal Definition and Min-Entropy Reduction

Expected Information Gain (EIG) in ICL measures, for a candidate input xx, the expected reduction in model uncertainty about the output labels YY given a context CC. In the paradigm instance considered, CC comprises only an empty or templated prompt TT. The general EIG expression is

EIGC(x)=H[p(yC)]Eyp(yC,x)[H[p(yC,x,y)]]\mathrm{EIG}_C(x) = H\bigl[p(y \mid C)\bigr] - \mathbb{E}_{y' \sim p(y \mid C, x)}\bigl[ H\bigl[p(y \mid C, x, y')\bigr] \bigr]

where H[]H[\cdot] denotes Shannon entropy. In practice, H[p(yC)]H[p(y \mid C)] is context-constant, and the expectation over unknown true labels is intractable for black-box LLMs. Employing black-box access yields an operational “min-entropy” criterion where utility is given by H[pθ(yx,T)]-\mathrm{H}\left[p_\theta(y \mid x, T)\right], with pθp_\theta the model’s zero-shot output conditional on YY0 and YY1. Thus, the practical utility of an example YY2 is defined as

YY3

The top-YY4 examples minimizing conditional entropy YY5 are selected for prompt assembly (Liu et al., 2023).

2. Contextual Max-Value EIG Sampling Criterion

The CMV-EIG criterion formalizes informativeness-driven selection via: YY6 where YY7 denotes the candidate pool. The reduction from “expected” to “observed” entropy is justified by invariance of the baseline entropy term and infeasibility of true label marginals when only black-box evaluations are possible. The information theoretic interpretation aligns with principles of active learning, though the candidate pool and selection dynamics are uniquely adapted to few-shot prompt engineering.

3. Template Bias and Calibration Before Sampling

Raw application of the min-entropy criterion is sensitive to template bias: non-uniform prior distributions YY8 arising from the prompt template alone. This bias causes certain candidate examples to “correct” for systemic over-prediction of specific labels, yielding low entropy but trivial informativeness (e.g., contentless samples appearing more informative than they are). Empirically, even empty templates induce pronounced skew.

To remove this bias, content-free calibration (CBS) is employed (Liu et al., 2023). The procedure consists of:

  1. Content-free prompt pool: Assemble YY9.
  2. Template bias vector computation: Average zero-shot outputs on content-free strings,

CC0

  1. Vector-scale calibration: Given model output CC1, scale by CC2 and apply softmax:

CC3

  1. Calibrated entropy criterion: Compute CC4, and select examples minimizing CC5.

This adjustment ensures selection uniquely reflects example informativeness rather than template-induced label preferences.

4. Algorithmic Implementation

The full CMV-EIG with calibration procedure is as follows:

CC3

Practical recommendations include drawing CC6 candidate samples, use of three content-free calibration strings, default greedy decoding (temperature CC7), and K-shot prompt construction with true gold labels for downstream evaluation (Liu et al., 2023).

5. Experimental Context and Benchmarking

Evaluations were conducted on six classification tasks: SST-2, AGNews, TREC, CB, RTE, and DBPedia. Three LLMs were considered (GPT-2 XL, GPT-J, GPT-3 davinci) with CC8 randomly subsampled train candidates per task. Each LLM was assessed over five random seeds (two for GPT-3), and evaluated on 300 test samples per task.

Comparative baselines included:

  • Random selection,
  • MaxEntropy (highest raw CC9),
  • MaxIG (min raw CC0, no calibration),
  • CBS MaxIG (calibrated information gain).

Empirical results demonstrate that CBS MaxIG achieves a CC1–CC2\% relative accuracy gain on average over random selection and consistently outperforms both MaxEntropy and uncalibrated MaxIG. This confirms IG as a robust informativeness proxy in ICL and indicates the necessity of pre-sampling calibration to neutralize template bias (Liu et al., 2023).

The CMV-EIG approach generalizes conventional entropy-based active learning techniques for use with LLM in-context learning via black-box access. By centering entropy minimization and incorporating a formal calibration step, CMV-EIG addresses both the variance induced by demonstration selection and systemic pitfalls of template and prompt-based evaluation. The technique is directly extensible to other LLMs and is robust under settings where gold labels are not initially accessible for all candidates—a scenario typical in prompt selection for ICL. A plausible implication is that further improvements in in-context demonstration selection may be achievable by refining calibration strategies or entropy estimation techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Contextual Max-Value Expected Information Gain (CMV-EIG).