Vector-ICL: Vector-Based In-Context Learning

Updated 15 March 2026

Vector-ICL is a paradigm that injects task-relevant continuous vectors into language models, enabling direct in-context learning without explicit demonstration texts.
The method relies on optimal layer extraction and vector aggregation to encode task information, yielding comparable or superior performance to standard few-shot approaches.
Empirical studies demonstrate that injecting vectors at intermediate layers and using multi-vector schemes can overcome rank limitations and boost robustness for complex tasks.

Vector-ICL refers to vector-based in-context learning, a paradigm in which a LLM’s hidden state is directly manipulated by injecting task-relevant continuous vectors rather than using conventional textual demonstration prompts. These vectors, typically extracted from specific hidden-state activations, encode information about the desired task, enabling rapid adaptation while bypassing the computational and contextual overhead associated with repeated few-shot prompting. Recent research demonstrates that such approaches can match or surpass standard few-shot in-context learning (ICL) in efficiency and, in particular circumstances, accuracy or robustness. However, the design, extraction, interpretability, expressivity, and limitations of these task vectors remain active areas of investigation.

1. Formalism and Core Mechanism

The central object in Vector-ICL is the task vector (or, by slight generalization, the in-context vector), denoted $v_{\text{task}}^{(l)} \in \mathbb{R}^d$ . Given a $k$ -shot in-context prompt comprising demonstration input–output pairs $x_1,y_1,\ldots,x_k,y_k$ concatenated as a sequence, a transformer computes hidden state activations $h^{(l)}(\text{prompt})$ at each layer $l$ . The task vector is defined as the hidden state at the position of a special separator token (e.g., “→” or an output delimiter) immediately following the last demonstration:

$v_{\text{task}}^{(l)} = h^{(l)}_{i^*}\left(\text{prompt}\right)$

where $i^*$ is the token index of the separator.

Intervention at inference:

For a new zero-shot query $x_{k+1}$ , instead of processing a full $k$ -shot prompt, the model processes only $x_{k+1}$ and, at layer $k$ 0, its hidden state at the separator is forcibly replaced: $k$ 1 All other hidden states remain unchanged, and autoregressive generation proceeds from layer $k$ 2 onward. This allows the LLM to “internalize” the task from a single vector, as opposed to the explicit sequence of demonstrations (Tikhonov et al., 29 May 2025).

Formally, this mechanism generalizes to cases where subtask- or token-specific vectors are injected at multiple positions within the output, particularly for structured or compositional generation tasks.

2. Extraction, Layer Selection, and Vector Aggregation

Layerwise behavior: Empirically, for Llama-3-8B, task vector effectiveness is sharply layer-dependent, with a bell-shaped performance curve across depth. Maximum average performance occurs at intermediate layers (e.g., $k$ 3 out of 32). Extracting from early or late layers leads to significant degradation, as early layers capture mostly local information while later layers are dominated by output surface form and exact predictions (Tikhonov et al., 29 May 2025).

Systematic extraction protocol:

For each layer $k$ 4, run the full $k$ 5-shot prompt and extract $k$ 6.
For each $k$ 7, intervene on held-out queries at that layer and evaluate output quality.
Fix $k$ 8 as the optimal layer for downstream usage.

Vector aggregation strategies: For long demonstration sets that exceed the context window or for high-shot ICL, aggregation schemes such as divide-and-conquer are necessary. Here, demonstrations are divided into groups, group-level state vectors are extracted, and aggregated (using meta-prompts with injected intermediate vectors) to recursively construct a final compressed state vector representative of all input demonstrations (Li et al., 2024). This methodology enables scalable vector-based ICL beyond conventional context lengths.

3. Expressivity, Linear Combination Conjecture, and Theoretical Analysis

A salient property of task vectors is their function as compressed representations—often acting as linear combinations of individual demonstration embeddings. The Linear Combination Conjecture asserts that the task vector can be viewed as a meta-demonstration,

$k$ 9

where $x_1,y_1,\ldots,x_k,y_k$ 0 are individual arrow-token states and $x_1,y_1,\ldots,x_k,y_k$ 1 are scalar weights (Dong et al., 10 Jun 2025).

Expressivity limitations: Injecting a single task vector constrains the model to rank-one function approximation, analogous to 1-shot ICL. Tasks requiring higher-rank mappings, such as general bijections, cannot be solved by a single vector; this is confirmed both theoretically and empirically (Dong et al., 10 Jun 2025). To overcome this, multi-vector injection schemes replace multiple tokens with distinct task vectors, restoring higher-rank capacity and improving accuracy on complex tasks.

Theoretical guarantees: A residual-stream transformer, under gradient descent on cross-entropy, provably converges—using task-vector arithmetic—for factual recall tasks, so long as hierarchical concept cues are cleanly separated and overparameterization is sufficient (Bu et al., 13 Aug 2025). This formalizes the analogy between Word2Vec-style vector arithmetic and in-context adaptation, revealing that transformers dynamically extract latent task vectors to implement compositional reasoning and concept recombination.

4. Practical Protocols: Dataset Evaluation and Application Domains

Benchmarking on large-scale datasets: The QuiteAFew dataset, comprising 3,096 tasks from diverse categories (e.g., “Classify,” “Generate,” “Rewrite,” “Edit,” “Describe”), is used to systematically evaluate vector-ICL on Llama-3-8B (Tikhonov et al., 29 May 2025). Protocols involve extracting task vectors from 7 demonstrations and injecting them during zero-shot inference for the 8th example.

Performance insights:

Format scores (output type/format) are more robustly transferred than correctness scores (semantic content), suggesting task vectors encode surface regularities but only partially capture deep semantic mappings.
Simple tasks (e.g., translation, unary classification, single-step re-writing) approach few-shot performance using single-vector ICL.
For complex, compositional tasks (e.g., structured JSON generation), injection of multiple subtask vectors at specific token positions yields substantial gains, clearly outperforming single-vector approaches.

Cross-modal and continuous data: Vector-ICL can operate over arbitrary continuous input domains by projecting embeddings (from images, molecules, fMRI, graphs, etc.) into the LLM embedding space as “box tokens,” which are then treated as atomic tokens by the LLM. Carefully pretrained projectors allow LLMs to process and reason over such representations, unlocking in-context generalization capabilities far beyond discrete language (Zhuang et al., 2024).

5. Optimization, Robustness, and Advanced Methods

Vector optimization: Beyond extraction, various vector refinement strategies, such as inner-loop gradient descent and momentum optimization, further enhance state vector quality (Li et al., 2024). These express the analogy between test-time vector adaptation and gradient-based meta-learning. Empirical results show that momentum-based updates yield the best accuracy gains and lowest variance.

Robust compressed representation: State vectors, when aggregated using divide-and-conquer protocols, display strong improvement in long-context or high-shot scenarios, outperforming naive averaging schemes. PCA analyses reveal that each added demonstration “pushes” the state vector along learned directions, justifying the effectiveness of momentum (Li et al., 2024).

Dynamic vector segmentation and injection: The DyVec family partitions layerwise latent representations into dynamic subspaces and learns optimal injection locations via REINFORCE policy gradients to adapt vector granularity and placement to task complexity. This advanced approach outperforms both traditional ICL, LoRA fine-tuning, and earlier vector-ICL methods in efficiency and accuracy (Cai et al., 23 May 2025).

6. Implications, Limitations, and Research Directions

Efficiency and deployment: Single-vector ICL allows for orders-of-magnitude speedup at inference, matching zero-shot prompt length but approximating few-shot performance. LIVE demonstrates this in large multimodal models for VQA, where learnable multi-layer shift vectors are trained to mimic few-shot ICL, surpassing both standard ICL and non-learnable vector baselines in both accuracy and computational cost (Peng et al., 2024).

Limitations: The primary constraints of classic vector-ICL are:

Rank limitation: A single injected vector cannot represent high-rank functions or support highly compositional or multi-step reasoning. Such tasks require multi-vector or token-by-token interventions (Tikhonov et al., 29 May 2025, Dong et al., 10 Jun 2025).
Task specificity: Vectors are highly dependent on the task and, for some tasks, on prompt order or tokenization. Robust extraction and data aggregation techniques are necessary for generalizability.
Expressivity ceiling: Projectors limit the richness of cross-modal analogical reasoning in LLMs unless architectures support variable-length or structured embeddings (Zhuang et al., 2024).

Research frontiers:

Automated subtask-vector selection: Clustering trajectories of hidden states to reveal compositional task boundaries.
Dynamic layer/token selection: Learning which spatial and depth coordinates optimally capture subtask semantics (Cai et al., 23 May 2025).
Cross-model transfer: Assessing the portability of optimal layer selections and vector roles across architectures and scales (Tikhonov et al., 29 May 2025).
Theoretical extension: Generalizing beyond linearized transformer models to account for full softmax, multi-head structures, and deep nonlinear interactions (Bu et al., 13 Aug 2025, Dong et al., 10 Jun 2025).
Interpretability and attribution: Saliency and parameter visualization reveal how task vector formation and injection manipulate the attention and value distributions (Dong et al., 10 Jun 2025).

7. Summary Table: Key Findings in Vector-ICL Research

Paper	Main Contribution	Limitation/Insight
(Tikhonov et al., 29 May 2025)	Layer 15 task vectors maximize ICL transfer; multi-vector is crucial for compositional tasks	Single-vector insufficient for complex outputs
(Li et al., 2024)	Inner/momentum optimization, divide-and-conquer aggregation for scalable vector ICL	Degradation when extracting from late layers
(Cai et al., 23 May 2025)	DyVec: dynamic segmentation, REINFORCE injection selection, robust performance	REINFORCE optimization needed for segment selection
(Zhuang et al., 2024)	Cross-modal vector-ICL via learnable projectors	Fixed-length embeddings only
(Bu et al., 13 Aug 2025)	Theoretical proof of task-vector arithmetic and compositionality in transformers	Requires QA-style formatting for clean compositionality
(Dong et al., 10 Jun 2025)	Linear Combination Conjecture, multi-vector injection for high-rank tasks	Rank-one constraint of single vector
(Peng et al., 2024)	LIVE: learnable layer-wise vector shifts for VQA; >24× speedup vs. ICL, higher accuracy	Requires supervised training; only tested in VQA

The consensus across these works is that Vector-ICL enables efficient, modular, and partially interpretable adaptation of LLM behavior by manipulating internal states—subject to fundamental limitations imposed by task complexity and underpinned by rich connections to both empirical and theoretical perspectives.