In-Context Learning Tasks Overview

Updated 29 December 2025

In-context learning tasks are defined by a prompt with multiple demonstration pairs and a query input, enabling adaptive performance without parameter updates.
They leverage mechanisms like task recognition, label regulation, and transfer learning to address diverse applications such as classification, generation, and regression.
Performance critically depends on demonstration quality, prompt design, and scaling laws, guiding future research into more robust and interpretable ICL methods.

In-context learning (ICL) is the phenomenon by which large pretrained transformer models, particularly LLMs, adapt to new tasks at inference time by conditioning on a context of task demonstrations—input–output pairs, chains-of-thought, or other structured examples—without updating any model parameters. ICL has enabled a suite of advances in NLP, vision-language, decision making, and multimodal processing by allowing models to "learn" from a small number of examples presented in the prompt alone. The study of ICL tasks—how they are formulated, what governs their efficacy, and where they break down—has led to fundamental insights into both the mechanism and limits of prompt-based adaptation.

1. Formalization of In-Context Learning Tasks

Formally, an in-context learning task is specified by:

A prompt consisting of $k$ demonstration pairs $\mathcal{P} = \{(x_i, y_i)\}_{i=1}^k$
A query input $x_{k+1}$
The model $f_\theta$ (parameters frozen) produces a prediction $y_{k+1}$ conditioned on the concatenation of $\mathcal{P}$ and $x_{k+1}$

The ICL objective is thus

$y_{k+1} = \arg\min_y \;\text{loss}\bigl(f_\theta(\mathcal{P}, x_{k+1}), y\bigr)$

with all adaptation occurring via the prompt. No gradient updates or weight modifications are performed at inference time (Zhao et al., 27 May 2025).

ICL tasks span few-shot classification (sentiment, NLI, paraphrase), sequence labeling, text and image generation, function regression, algorithmic tasks, and reinforcement learning trajectories (Long et al., 2024, Raparthy et al., 2023, Zhao et al., 27 May 2025, Brumley et al., 2024, Chen et al., 14 Oct 2025, Wang et al., 2024).

2. Mechanistic Foundations and Types of ICL Tasks

ICL is frequently decomposed into distinct cognitive/mechanistic roles:

Task Recognition (TR): The model recognizes which distribution/task is described by the prompt, leveraging pre-trained priors. TR alone enables nontrivial performance even when label-demonstration pairings are scrambled (Pan et al., 2023, Wies et al., 2023).
Task Learning (TL): The model infers a new input–output mapping—the essence of adaptive learning within the prompt—when the mapping has not been seen in pre-training. TL emerges reliably only at large scale and with sufficient in-context shots (Pan et al., 2023).
Label Space and Format Regulation: Demonstrations constrain output space and surface-form, accounting for most performance improvements in typical few-shot ICL scenarios (Long et al., 2024).
Discriminative Adaptation: True improvement in "reasoning" or "classification ability" beyond label/format regulation is marginal in most practical ICL applications, unless context examples are semantically retrieved and label-diverse (Long et al., 2024).

A broad taxonomy of ICL task types is summarized below:

Task Family	Representative Examples	ICL Mechanism
Classification	Sentiment, NLI, multi-class, hate speech	TR, Label Regulation
Sequence Labeling	NER, ABSA, event/relation extraction	TR+TL (spec.-dependent)
Generation	Summarization, story continuation	Style priming
Algorithmic	Polynomial regression, exponential modular	TL, composition
World Modeling	Maze navigation, sequential RL	TL, memory
Vision	Linear/image regression, classification	TL, compositionality

(Long et al., 2024, Raparthy et al., 2023, Chen et al., 14 Oct 2025, Lee et al., 16 Jun 2025, Wang et al., 2024, Zhao et al., 27 May 2025)

3. Successes, Limitations, and Scaling Laws

ICL is highly effective in domains where model pre-training covers mixtures of latent tasks and where the ICL prompt is sufficient to "identify" the underlying task component (Wies et al., 2023). Empirical and theoretical analysis demonstrates:

Accurate few-shot generalization for simple mapping or classification tasks ( $\leq$ 5–10 classes) with as few as 5 demonstrations (Long et al., 2024).
Strong dependence on model and dataset scale for true TL: only large models (≥10B parameters) reliably acquire new label mappings unseen in pre-training (Pan et al., 2023).
Scaling the number of demonstration examples (context-scaling) improves performance within a fixed task; scaling the diversity of pre-training tasks (task-scaling) improves generalization across tasks. Transformers exhibit both forms; MLPs typically only task-scaling (Abedsoltan et al., 2024).
Performance saturates when context or pre-training task variety reach limits; further improvements require either model capacity or broader, more structured prompts (Abedsoltan et al., 2024, Wang et al., 2024).

Failure cases are well documented:

Specification-heavy tasks: Event extraction, schema-based IE, multi-step reasoning—ICL fails without exhaustive prompt schemas, due to inability to fully encode specification complexity, schema misalignment, and insufficient long-context capabilities (Peng et al., 2023).
Compositional generalization: Unless the context arranges modular subtask demonstrations before composite demonstrations, transformers do not learn to chain intermediate computations (Lee et al., 16 Jun 2025).
Input/Output Range: Transformers cannot generalize ICL predictions outside the domain support seen at pre-training, due to intrinsic architectural clamping induced by softmax attention (Naim et al., 5 Feb 2025).

4. Representation of Task Information: Task Vectors, Schemas, and Beyond

Internal representation of tasks in ICL has been probed via:

Task Vectors: Summary activations at intermediate layers (especially at layers 10–20 in Llama-3-8B) encode most of the "task" in simple ICL scenarios, but complex/compositional tasks require multiple subtask-specific vectors, challenging the one-vector hypothesis (Tikhonov et al., 29 May 2025).
Learnable Task Vectors (LTV): Weighted sums over attention heads, trained causally, robustly encode task representations across modalities and sequence lengths (Saglam et al., 8 Feb 2025).
Schema-based Activation (SA-ICL): Inspired by schema theory, explicit schema templates (structured JSON scaffolding of inferred reasoning steps) significantly boost performance and interpretability over standard one-shot or chain-of-thought prompts—gains up to 35 pp and higher interpretability were observed in scientific multi-step QA (Chen et al., 14 Oct 2025).
Function vs. In-Context Vectors: Bottom-up (function vectors, targeting attention heads) and top-down (in-context vectors, global residual shifts) steering excel in precise and behavioral ICL tasks, respectively, but neither fully subsumes the other (Brumley et al., 2024).

5. Task Construction, Prompt Design, and Demonstration Selection

The success of ICL is highly sensitive to task and prompt construction:

Prompt Format: Explicitly stating label sets and desired output formats recovers most benefits of demonstrations for ICL in classification tasks (Long et al., 2024).
Schema and demonstration organization: Blocked curricula that present subtasks before composite tasks induce symbolic composition circuits; vanilla randomized context fails to induce such computations (Lee et al., 16 Jun 2025).
Demonstration Source: Quality and diversity of demonstrations are critical—retrieving semantically similar ICL examples boosts discriminative ability but can undermine label diversity; transfer of demonstrations from similar tasks (In-Context Transfer Learning) outperforms naive synthesis (Wang et al., 2024).
Intrinsic Task Mining: Pre-training on curated sets of naturally occurring “intrinsic tasks” extracted from plain text paragraphs (PICL) yields ICL gains exceeding those of much larger vanilla models (Gu et al., 2023).
Sequential and interactive tasks: Lifelong, multi-step tasks require tasks with long interaction horizons, persistent state, and fine-grained memory, with benchmarks designed to maximize context diversity and minimal task overlap (Wang et al., 2024).

6. Theoretical Foundations and Emergent Properties

PAC Learnability: ICL can be rigorously framed via PAC guarantees: given sufficient mixture separation (large KL gen) in pre-training and prompt length proportional to $1/\epsilon$ , the in-context learner approaches Bayes limit for downstream tasks, up to finite sample complexity (Wies et al., 2023).
Compositional and curriculum effects: ICL emerges most robustly when pre-training, curriculum, and prompts provide sufficient cues to both recognize (TR) and adapt to (TL) tasks; over-structured prompts or positional constraints can block ICL generalization entirely (Wibisono et al., 2024).
Architectural constraints: Transformers with standard softmax attention exhibit clamped output ranges, so extrapolation fails for ICL tasks outside the pre-training domain (Naim et al., 5 Feb 2025).
Regularization and phase transitions: For low-rank regression, ICL generalization error exhibits a sharp phase transition governed by task diversity and problem rank, with finite-task variance inducing effective regularization (Takanami et al., 6 Oct 2025).

7. Future Directions, Benchmarks, and Practical Guidelines

Research is advancing ICL task design through:

Development of general-purpose ICL benchmarks with high task diversity, long adaptation horizons, and compositional/interleaved task structure to probe scalable, robust, and interpretable in-context learning (Wang et al., 2024).
Exploration of parameter-efficient, pre-training, and alignment strategies to close the gap for specification-heavy tasks (Peng et al., 2023, Gu et al., 2023).
Extension of ICL methods to vision, multimodal, world-model, and sequential decision making contexts (Zhao et al., 27 May 2025, Raparthy et al., 2023).
Combining schema activation, compositional curricula, and structurally-aware prompt synthesis to promote human-like reasoning, generalization, and interpretability (Chen et al., 14 Oct 2025).
Empirical and theoretical analyses are guiding optimal demonstration selection, prompt construction, and curriculum strategies for robust ICL in increasingly complex and safety-critical domains.

In summary, the study of in-context learning tasks has matured into a deep, mathematically grounded subfield with implications for learning theory, representation, model design, and application-specific adaptation, with open challenges remaining in long-horizon, compositional, and specification-rich regimes (Wies et al., 2023, Abedsoltan et al., 2024, Wibisono et al., 2024, Lee et al., 16 Jun 2025, Chen et al., 14 Oct 2025).

Markdown Upgrade to Chat

References (18)

In Context Learning with Vision Transformers: Case Study (2025)

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning (2024)

Generalization to New Sequential Decision Making Tasks with In-Context Learning (2023)

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks (2024)

Schema for In-Context Learning (2025)

Benchmarking General-Purpose In-Context Learning (2024)

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning (2023)

The Learnability of In-Context Learning (2023)

Distinct Computations Emerge From Compositional Curricula in In-Context Learning (2025)

10.

Context-Scaling versus Task-Scaling in In-Context Learning (2024)

11.

When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks (2023)

12.

Analyzing limits for in-context learning (2025)

13.

One Task Vector is not Enough: A Large-Scale Study for In-Context Learning (2025)

14.

Learning Task Representations from In-Context Learning (2025)

15.

In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks (2024)

16.

Pre-Training to Learn in Context (2023)

17.

From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When (2024)

18.

Learning Linear Regression with Low-Rank Tasks in-Context (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to In-Context Learning Tasks.

In-Context Learning Tasks Overview

1. Formalization of In-Context Learning Tasks

2. Mechanistic Foundations and Types of ICL Tasks

3. Successes, Limitations, and Scaling Laws

4. Representation of Task Information: Task Vectors, Schemas, and Beyond

5. Task Construction, Prompt Design, and Demonstration Selection

6. Theoretical Foundations and Emergent Properties

7. Future Directions, Benchmarks, and Practical Guidelines

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

In-Context Learning Tasks Overview

1. Formalization of In-Context Learning Tasks

2. Mechanistic Foundations and Types of ICL Tasks

3. Successes, Limitations, and Scaling Laws

4. Representation of Task Information: Task Vectors, Schemas, and Beyond

5. Task Construction, Prompt Design, and Demonstration Selection

6. Theoretical Foundations and Emergent Properties

7. Future Directions, Benchmarks, and Practical Guidelines

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research