InSQuAD: Exemplar Selection Framework

Updated 29 August 2025

InSQuAD is a framework that employs submodular mutual information to ensure the selected exemplars are both relevant and diverse for effective in-context learning.
The approach uses a combinatorial training paradigm with a likelihood-based loss to optimize the balance between quality and diversity.
Empirical results across nine benchmarks show significant improvements in classification, multi-choice, and generative QA tasks, reducing inference time.

InSQuAD is a framework for exemplar selection in In-Context Learning (ICL) that enforces both quality (relevance) and diversity among in-context examples using Submodular Mutual Information (SMI) functions and a combinatorial training paradigm. Developed to address limitations in traditional retrieval methods—where query relevance is modeled at the expense of diversity—InSQuAD achieves robust ICL by modeling exemplar selection as a targeted submodular maximization problem and by training a dedicated retrieval model via a likelihood-based loss over SMI. The approach is validated empirically across nine benchmark datasets, demonstrating substantial gains over relevance-only baselines and reducing inference time through efficient combinatorial selection and dataset augmentation with paraphrases.

1. Motivation and Problem Formulation

The premise of InSQuAD is that effective ICL requires selecting in-context exemplars that are not merely relevant to the test query, but also collectively diverse and non-redundant. Existing retrieval approaches predominantly optimize for quality—gathering examples nearest to the query in embedding space—yet ignore the combinatorial structure that arises when exemplars overlap semantically or syntactically. InSQuAD targets three properties: quality, diversity, and order.

To formalize, InSQuAD frames the selection as:

Exemplar Annotation: Constructing a diverse subset from an unlabeled pool to represent the annotation distribution.
Exemplar Retrieval: Given a query $q_{\text{test}}$ , selecting top- $k$ in-context examples $\mathcal{C}$ that maximize both query similarity and non-overlap.

This strategy ensures that the chosen set $\mathcal{C}$ maximizes information with respect to the query while minimizing redundancy among selected exemplars, which is crucial for prompting LLMs in multi-hop or reasoning-intensive QA.

2. Submodular Mutual Information (SMI) Functions

InSQuAD uses SMI functions to balance relevance and diversity in selection:

Quality: Quantified by the mutual information between the exemplar set and the query, $I_f(\mathcal{C}; q_{\text{test}})$ .
Diversity: Enforced via submodular functions, which reward incremental “coverage” and penalize redundancy.

Formally,

$\mathcal{C} \leftarrow \underset{\mathcal{C} \subseteq V_{\text{labeled}}, |\mathcal{C}| \leq k}{\arg\max} I_f(\mathcal{C}; q_{\text{test}})$

where $V_{\text{labeled}}$ is the pool of candidate exemplars and $I_f$ is the SMI function. Submodularity ensures that greedy selection yields near-optimal solutions efficiently, capturing both incremental query relevance and pairwise diversity.

During annotation,

$V_{\text{shortlisted}} \leftarrow \underset{V_{\text{shortlisted}} \subseteq V, |V_{\text{shortlisted}}| \leq B}{\arg\max} f(V_{\text{shortlisted}})$

where $B$ is a budget and $f$ is scored with the full pool (for diversity).

3. Combinatorial Training and Likelihood-Based Loss

To prevent the retrieval model from overfitting to query similarity alone, InSQuAD introduces a combinatorial training protocol (InSQuAD-LEARN) that adapts SMI parameters through a likelihood-based loss derived from Submodular Point Processes (SPPs).

Given a query $Q$ , a set of relevant documents $S^+$ , and distractor documents $S^-$ , the probability of choosing $S$ is:

$P_\theta^Q(S) = \frac{I_f(S; Q)}{\sum_{S' \subseteq V} I_f(S'; Q)}$

The ratio for relevant over distractor sets is:

$\alpha_Q^\theta = \frac{I_f(S^+; Q)}{I_f(S^-; Q)}$

Yielding the negative log-likelihood:

$L = -\log(\alpha_Q^\theta) = \log(I_f(S^-; Q)) - \log(I_f(S^+; Q))$

The overall joint loss—including diversity enforced by paraphrastic augmentations—is:

$L_{\text{InSQuaD}} = \exp\big((1 - \lambda) L_q + \lambda L_d\big)$

where $L_q$ (quality loss) and $L_d$ (diversity loss) compare the information overlap between query, relevant, and paraphrased distractor sets, and $\lambda$ weights their importance.

4. Dataset Augmentation via Paraphrases

A unique component is paraphrase augmentation. Multi-hop QA datasets, such as HotpotQA, lack sufficient paraphrastic or distractor variants. InSQuAD addresses this by synthetically generating paraphrases for each supporting document using large models (e.g., GPT-3.5 Turbo). Training instances thus comprise $q$ , $S^+$ (original relevant), $S^-$ (original distractor), and $S^p$ (paraphrased variants). This constrains the model to maximize true quality signals while actively ignoring paraphrase-level similarity that would otherwise compromise diversity.

5. Exemplar Selection and In-Context Generation

At inference, the retrieval model $R(\cdot, \theta)$ produces in-context exemplars for a test query using the SMI-based scoring. The major formulas are:

Generation conditioning:

$p(y_{\text{test}} | \mathcal{C}, q_{\text{test}}) = \mathcal{M}(V(y_{\text{test}}) | \mathcal{C}, T(q_{\text{test}}); \hat{\theta})$

where $y_{\text{test}}$ is the LLM output, $\mathcal{C}$ are selected exemplars, $T$ is a templating function, and $\hat{\theta}$ are the learned parameters.

Selection via SMI:

$\mathcal{C} \leftarrow \underset{\mathcal{C} \subseteq V_{\text{labeled}}, |\mathcal{C}| \leq k}{\arg\max} I_f(\mathcal{C}; q_{\text{test}})$

6. Experimental Validation and Results

On nine benchmarks (classification, multi-choice, and generative QA), InSQuAD-RETRIEVE plus InSQuAD-LEARN achieves:

Up to 21.6% improvement on classification tasks
16.4% gains on multi-choice tasks
Up to 7% improvement on generative ICL

Ablation studies demonstrate reduced inference time compared to iterative or confidence-based selection strategies. The approach produces superior retrieval sets with respect to the joint quality-diversity objective, demonstrating practical efficacy for academic and commercial LLM deployment.

7. Implications and Significance

By enforcing both quality and diversity in in-context example selection through submodular mutual information, InSQuAD improves generalization, robustness, and efficiency in ICL workflows. Its likelihood-based combinatorial training ensures that retrieval models move beyond nearest-neighbor heuristics, capturing complex relationships needed for compositional multi-task reasoning in modern QA systems. Synthetic paraphrase augmentation makes the approach viable even in data-sparse regimes by preventing spurious overlap. The framework is modular, permitting extension to other domains with pool-based selection and paraphrase augmentation.

A plausible implication is that future benchmarks (such as those targeting procedural guidance or multi-document conversational QA (Wu et al., 2024, Wu et al., 2023)) may adopt analogous SMI-based strategies to enforce comprehensive coverage and diversity in prompt construction and evaluation protocols.

PDF Markdown Chat (Pro)

References (2)

Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents (2024)

HeySQuAD: A Spoken Question Answering Dataset (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to InSQuAD.

InSQuAD: Exemplar Selection Framework

1. Motivation and Problem Formulation

2. Submodular Mutual Information (SMI) Functions

3. Combinatorial Training and Likelihood-Based Loss

4. Dataset Augmentation via Paraphrases

5. Exemplar Selection and In-Context Generation

6. Experimental Validation and Results

7. Implications and Significance

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

InSQuAD: Exemplar Selection Framework

1. Motivation and Problem Formulation

2. Submodular Mutual Information (SMI) Functions

3. Combinatorial Training and Likelihood-Based Loss

4. Dataset Augmentation via Paraphrases

5. Exemplar Selection and In-Context Generation

6. Experimental Validation and Results

7. Implications and Significance

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research