AdaptMI+: Adaptive Math In-Context Learning

Updated 19 September 2025

AdaptMI+ is an adaptive in-context learning method that selects math examples based on diagnosed skill gaps to optimize small language model performance.
It employs a two-stage strategy by assessing question difficulty with a process reward model and applying targeted skill remediation only when needed.
Empirical evaluations show up to a 6% accuracy improvement on math benchmarks, underscoring its effectiveness in reducing cognitive overload.

AdaptMI+ encompasses adaptive methodologies for enhancing small LLM (SLM) performance in math in-context learning (ICL) via dynamic, skill-based example selection. This technique builds upon preceding skill-aware ICL research by explicitly addressing cognitive overload in SLMs and refining the instructional context to target only those mathematical skills empirically shown to be absent in the model's initial response. AdaptMI+ leverages both a process-level reward model for difficulty classification and a precision targeting mechanism for remedial skill instruction, resulting in substantial accuracy improvements on established math benchmarks.

1. Foundational Concepts and Motivation

AdaptMI+ is rooted in recent advancements for math ICL, particularly the recognition that SLMs (e.g., 1–7B parameter models such as Qwen or Llama) do not benefit uniformly from skill-based prompting strategies. Unlike larger LLMs, which can synthesize and exploit redundant skill cues, smaller models are prone to cognitive overload—a pedagogically recognized phenomenon wherein excessive or irrelevant information hinders task performance. AdaptMI+ is designed to mitigate this issue by adaptively selecting instructional examples based on the SLM’s observed needs.

Key elements:

Skill Bank: Predefined catalogue of mathematical skills (e.g., equation solving, modular arithmetic, exponentiation), each associated with annotated in-context examples.
Skill-Map: Mapping from each math question to required skills for solution, generated via LLM-powered skill detection.
Process Reward Model: Quantifies stepwise correctness in SLM outputs, enabling question difficulty classification.

2. Two-Stage Adaptive Example Selection Strategy

AdaptMI+ employs an explicit two-stage procedure:

Stage 1: Difficulty Assessment

Each input question q is processed by a pre-trained process reward model, which outputs per-step scores $\{r_{q,1}, \dots, r_{q,t}\}$ .
Using thresholds $\tau_1$ and $\tau_2$ , AdaptMI+ computes:

$R(q) = \begin{cases} 0 & \text{if } r_{q,t} \leq \tau_1 \text{~or~} (1/t) \sum_{i=1}^t r_{q,i} \leq \tau_1 \text{~or~} \exists i < t: r_{q,i} \leq \tau_2 \ 1 & \text{otherwise} \end{cases}$

$R(q)=0$ triggers “difficult” classification and adaptive example selection; $R(q)=1$ uses static exemplars.

Stage 2: Targeted Skill Remediation

For questions identified as difficult, adaptively select examples only for skills missing from the SLM’s initial output (determined by comparing the response against the “Skill-Map” via a strong LLM, e.g., GPT-4o-mini).
For each missing skill $s \in$ Missing_Skills, one example $e$ with annotated skill $s$ is retrieved from the pool.
For easy questions, avoid additional skill-based prompts to prevent unnecessary overload.

3. Integration of Cognitive Load Theory

Drawing on cognitive load theory, AdaptMI+ operationalizes instructional adaptation by sparing the SLM from superfluous in-context content:

For easy questions, only essential fixed examples are provided.
For difficult questions, intervention is strictly limited to the skills the model has demonstrably failed to apply.
This mimics adaptive teaching practice in human pedagogy where targeted feedback steers learning and minimizes distraction.

4. Diagnostic Skill Targeting and Remediation Process

AdaptMI+ introduces a diagnostic phase for error correction:

Each example and question is tagged with required skills using a manually constructed Skill-Bank and an LLM-generated Skill-Map.
After the initial SLM response, skill gaps are diagnosed by prompting a metacognitive LLM to compare the solution output against required skills and to enumerate missing elements.
This feedback loop allows for dynamic, iterative refinement, in which further rounds of targeted example insertion can be conducted.

5. Empirical Evaluation on Math Benchmarks

Experimental results establish AdaptMI+ as an effective intervention for SLM math reasoning:

Accuracy Gains: On 5-shot ICL over math benchmarks (MATH, GSM8K), AdaptMI+ achieves up to 6% absolute improvement in comparison to naive (blanket) skill-based prompting methods.
Small Model Focus: Gains are especially pronounced in smaller models (1B–7B), which are more sensitive to overload and benefit more from precise adaptive example selection.
Robustness: AdaptMI+ consistently outperforms random and fixed selection strategies and surpasses alternative consistency-based approaches (e.g., Consistency@5) for difficult question subsets.

6. Significance and Implications for SLM Instructional Design

The central implication of AdaptMI+ is that in-context prompt optimization for SLMs should be both difficulty-aware and skill-targeted. By leveraging large LLMs for skill annotation and targeted remediation, the method counterbalances SLM metacognitive limitations without flooding the input context. This:

Helps bridge the ICL performance gap between SLMs and advanced LLMs,
Provides an instructional template for continual improvement via adaptive feedback,
Suggests that even with limited context window and model capacity, adaptive instructional selection enables significant accuracy improvements in domain-specific reasoning tasks.

A plausible extension is the iterative application of the AdaptMI+ mechanism for multi-stage remediation, where skill gaps detected in subsequent outputs trigger further targeted example selection.

7. Limitations and Prospective Directions

Limitations include:

Dependence on the fidelity of both Skill-Bank annotation and LLM skill diagnostic capabilities,
Empirical gains contingent on the accuracy of question difficulty classification via the reward model,
The approach, while adaptive, still requires nontrivial infrastructure for skill mapping and annotation.

Future directions proposed include:

Further automation in skill annotation and gap identification,
Extension to broader mathematics and STEM domains,
Investigation into dynamic context window allocation for SLMs under adaptive instructional stress.

In summary, AdaptMI+ advances adaptive in-context math prompting for SLMs by selectively and diagnostically providing targeted skill-based examples only when necessary, reducing cognitive overload and demonstrably improving mathematical problem-solving accuracy in resource-constrained scenarios.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to AdaptMI+.