Papers
Topics
Authors
Recent
2000 character limit reached

In-Context Learning Tasks Overview

Updated 29 December 2025
  • In-context learning tasks are defined by a prompt with multiple demonstration pairs and a query input, enabling adaptive performance without parameter updates.
  • They leverage mechanisms like task recognition, label regulation, and transfer learning to address diverse applications such as classification, generation, and regression.
  • Performance critically depends on demonstration quality, prompt design, and scaling laws, guiding future research into more robust and interpretable ICL methods.

In-context learning (ICL) is the phenomenon by which large pretrained transformer models, particularly LLMs, adapt to new tasks at inference time by conditioning on a context of task demonstrations—input–output pairs, chains-of-thought, or other structured examples—without updating any model parameters. ICL has enabled a suite of advances in NLP, vision-language, decision making, and multimodal processing by allowing models to "learn" from a small number of examples presented in the prompt alone. The study of ICL tasks—how they are formulated, what governs their efficacy, and where they break down—has led to fundamental insights into both the mechanism and limits of prompt-based adaptation.

1. Formalization of In-Context Learning Tasks

Formally, an in-context learning task is specified by:

  • A prompt consisting of kk demonstration pairs P={(xi,yi)}i=1k\mathcal{P} = \{(x_i, y_i)\}_{i=1}^k
  • A query input xk+1x_{k+1}
  • The model fθf_\theta (parameters frozen) produces a prediction yk+1y_{k+1} conditioned on the concatenation of P\mathcal{P} and xk+1x_{k+1}

The ICL objective is thus

yk+1=argminy  loss(fθ(P,xk+1),y)y_{k+1} = \arg\min_y \;\text{loss}\bigl(f_\theta(\mathcal{P}, x_{k+1}), y\bigr)

with all adaptation occurring via the prompt. No gradient updates or weight modifications are performed at inference time (Zhao et al., 27 May 2025).

ICL tasks span few-shot classification (sentiment, NLI, paraphrase), sequence labeling, text and image generation, function regression, algorithmic tasks, and reinforcement learning trajectories (Long et al., 11 Apr 2024, Raparthy et al., 2023, Zhao et al., 27 May 2025, Brumley et al., 11 Nov 2024, Chen et al., 14 Oct 2025, Wang et al., 27 May 2024).

2. Mechanistic Foundations and Types of ICL Tasks

ICL is frequently decomposed into distinct cognitive/mechanistic roles:

  • Task Recognition (TR): The model recognizes which distribution/task is described by the prompt, leveraging pre-trained priors. TR alone enables nontrivial performance even when label-demonstration pairings are scrambled (Pan et al., 2023, Wies et al., 2023).
  • Task Learning (TL): The model infers a new input–output mapping—the essence of adaptive learning within the prompt—when the mapping has not been seen in pre-training. TL emerges reliably only at large scale and with sufficient in-context shots (Pan et al., 2023).
  • Label Space and Format Regulation: Demonstrations constrain output space and surface-form, accounting for most performance improvements in typical few-shot ICL scenarios (Long et al., 11 Apr 2024).
  • Discriminative Adaptation: True improvement in "reasoning" or "classification ability" beyond label/format regulation is marginal in most practical ICL applications, unless context examples are semantically retrieved and label-diverse (Long et al., 11 Apr 2024).

A broad taxonomy of ICL task types is summarized below:

Task Family Representative Examples ICL Mechanism
Classification Sentiment, NLI, multi-class, hate speech TR, Label Regulation
Sequence Labeling NER, ABSA, event/relation extraction TR+TL (spec.-dependent)
Generation Summarization, story continuation Style priming
Algorithmic Polynomial regression, exponential modular TL, composition
World Modeling Maze navigation, sequential RL TL, memory
Vision Linear/image regression, classification TL, compositionality

(Long et al., 11 Apr 2024, Raparthy et al., 2023, Chen et al., 14 Oct 2025, Lee et al., 16 Jun 2025, Wang et al., 27 May 2024, Zhao et al., 27 May 2025)

3. Successes, Limitations, and Scaling Laws

ICL is highly effective in domains where model pre-training covers mixtures of latent tasks and where the ICL prompt is sufficient to "identify" the underlying task component (Wies et al., 2023). Empirical and theoretical analysis demonstrates:

  • Accurate few-shot generalization for simple mapping or classification tasks (\leq 5–10 classes) with as few as 5 demonstrations (Long et al., 11 Apr 2024).
  • Strong dependence on model and dataset scale for true TL: only large models (≥10B parameters) reliably acquire new label mappings unseen in pre-training (Pan et al., 2023).
  • Scaling the number of demonstration examples (context-scaling) improves performance within a fixed task; scaling the diversity of pre-training tasks (task-scaling) improves generalization across tasks. Transformers exhibit both forms; MLPs typically only task-scaling (Abedsoltan et al., 16 Oct 2024).
  • Performance saturates when context or pre-training task variety reach limits; further improvements require either model capacity or broader, more structured prompts (Abedsoltan et al., 16 Oct 2024, Wang et al., 27 May 2024).

Failure cases are well documented:

  • Specification-heavy tasks: Event extraction, schema-based IE, multi-step reasoning—ICL fails without exhaustive prompt schemas, due to inability to fully encode specification complexity, schema misalignment, and insufficient long-context capabilities (Peng et al., 2023).
  • Compositional generalization: Unless the context arranges modular subtask demonstrations before composite demonstrations, transformers do not learn to chain intermediate computations (Lee et al., 16 Jun 2025).
  • Input/Output Range: Transformers cannot generalize ICL predictions outside the domain support seen at pre-training, due to intrinsic architectural clamping induced by softmax attention (Naim et al., 5 Feb 2025).

4. Representation of Task Information: Task Vectors, Schemas, and Beyond

Internal representation of tasks in ICL has been probed via:

  • Task Vectors: Summary activations at intermediate layers (especially at layers 10–20 in Llama-3-8B) encode most of the "task" in simple ICL scenarios, but complex/compositional tasks require multiple subtask-specific vectors, challenging the one-vector hypothesis (Tikhonov et al., 29 May 2025).
  • Learnable Task Vectors (LTV): Weighted sums over attention heads, trained causally, robustly encode task representations across modalities and sequence lengths (Saglam et al., 8 Feb 2025).
  • Schema-based Activation (SA-ICL): Inspired by schema theory, explicit schema templates (structured JSON scaffolding of inferred reasoning steps) significantly boost performance and interpretability over standard one-shot or chain-of-thought prompts—gains up to 35 pp and higher interpretability were observed in scientific multi-step QA (Chen et al., 14 Oct 2025).
  • Function vs. In-Context Vectors: Bottom-up (function vectors, targeting attention heads) and top-down (in-context vectors, global residual shifts) steering excel in precise and behavioral ICL tasks, respectively, but neither fully subsumes the other (Brumley et al., 11 Nov 2024).

5. Task Construction, Prompt Design, and Demonstration Selection

The success of ICL is highly sensitive to task and prompt construction:

  • Prompt Format: Explicitly stating label sets and desired output formats recovers most benefits of demonstrations for ICL in classification tasks (Long et al., 11 Apr 2024).
  • Schema and demonstration organization: Blocked curricula that present subtasks before composite tasks induce symbolic composition circuits; vanilla randomized context fails to induce such computations (Lee et al., 16 Jun 2025).
  • Demonstration Source: Quality and diversity of demonstrations are critical—retrieving semantically similar ICL examples boosts discriminative ability but can undermine label diversity; transfer of demonstrations from similar tasks (In-Context Transfer Learning) outperforms naive synthesis (Wang et al., 2 Oct 2024).
  • Intrinsic Task Mining: Pre-training on curated sets of naturally occurring “intrinsic tasks” extracted from plain text paragraphs (PICL) yields ICL gains exceeding those of much larger vanilla models (Gu et al., 2023).
  • Sequential and interactive tasks: Lifelong, multi-step tasks require tasks with long interaction horizons, persistent state, and fine-grained memory, with benchmarks designed to maximize context diversity and minimal task overlap (Wang et al., 27 May 2024).

6. Theoretical Foundations and Emergent Properties

  • PAC Learnability: ICL can be rigorously framed via PAC guarantees: given sufficient mixture separation (large KL gen) in pre-training and prompt length proportional to 1/ϵ1/\epsilon, the in-context learner approaches Bayes limit for downstream tasks, up to finite sample complexity (Wies et al., 2023).
  • Compositional and curriculum effects: ICL emerges most robustly when pre-training, curriculum, and prompts provide sufficient cues to both recognize (TR) and adapt to (TL) tasks; over-structured prompts or positional constraints can block ICL generalization entirely (Wibisono et al., 31 May 2024).
  • Architectural constraints: Transformers with standard softmax attention exhibit clamped output ranges, so extrapolation fails for ICL tasks outside the pre-training domain (Naim et al., 5 Feb 2025).
  • Regularization and phase transitions: For low-rank regression, ICL generalization error exhibits a sharp phase transition governed by task diversity and problem rank, with finite-task variance inducing effective regularization (Takanami et al., 6 Oct 2025).

7. Future Directions, Benchmarks, and Practical Guidelines

Research is advancing ICL task design through:

  • Development of general-purpose ICL benchmarks with high task diversity, long adaptation horizons, and compositional/interleaved task structure to probe scalable, robust, and interpretable in-context learning (Wang et al., 27 May 2024).
  • Exploration of parameter-efficient, pre-training, and alignment strategies to close the gap for specification-heavy tasks (Peng et al., 2023, Gu et al., 2023).
  • Extension of ICL methods to vision, multimodal, world-model, and sequential decision making contexts (Zhao et al., 27 May 2025, Raparthy et al., 2023).
  • Combining schema activation, compositional curricula, and structurally-aware prompt synthesis to promote human-like reasoning, generalization, and interpretability (Chen et al., 14 Oct 2025).
  • Empirical and theoretical analyses are guiding optimal demonstration selection, prompt construction, and curriculum strategies for robust ICL in increasingly complex and safety-critical domains.

In summary, the study of in-context learning tasks has matured into a deep, mathematically grounded subfield with implications for learning theory, representation, model design, and application-specific adaptation, with open challenges remaining in long-horizon, compositional, and specification-rich regimes (Wies et al., 2023, Abedsoltan et al., 16 Oct 2024, Wibisono et al., 31 May 2024, Lee et al., 16 Jun 2025, Chen et al., 14 Oct 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to In-Context Learning Tasks.