Task Vector Bases in Multi-Task Models

Updated 26 June 2026

Task vector bases are collections of basis atoms from model parameter, activation, or output spaces that capture the direction of task-specific effects via structured linear combinations.
They enable efficient model merging, task arithmetic, and in-context learning by compressing redundant information and revealing latent task groupings across multi-modal architectures.
Construction algorithms like greedy selection, SVD, and autoencoding offer strong empirical and theoretical guarantees, ensuring accurate recovery and interpretability of task transformations.

A task vector basis is a collection of vectors—typically derived from parameter, activation, or output spaces of models—such that any task-specific direction or transformation of interest can be represented as a structured (often sparse or low-rank) linear combination of these basis atoms. Task vector bases serve to compress, interpret, and efficiently manipulate the “atoms” of adaptation in modern multi-task, multi-modal, and in-context model architectures, from deep neural networks to large language and vision-LLMs. The construction, properties, and utilization of such bases has become central to recent advances in model merging, scalable in-context learning, task arithmetic, and interpretability, with strong theoretical and empirical foundations across modalities.

1. Mathematical Formulation and Expressivity of Task Vector Bases

Formally, task vectors are defined as directions in a vector space encoding a task's “effect”—e.g., in parameter space $\tau_t = \theta_{\rm ft}^t - \theta_{\rm pre}$ , in hidden space as difference or mean activations, or in output/decoding space as logit differences (Zeng et al., 3 Feb 2025, Kumar et al., 2012, Li et al., 13 Apr 2026). A task vector basis is a set $\{b_j\}_{j=1}^M$ (with $M < T$ tasks) such that each task vector $\tau_i$ is well-approximated by a linear (often sparse or low-rank) combination: $\tau_i \approx \sum_{j=1}^M c_{i,j} b_j$ or, in matrix notation, $T \approx B C^\top$ with $B\in\mathbb{R}^{d\times M}$ and $C\in\mathbb{R}^{T\times M}$ (Zeng et al., 3 Feb 2025).

Task vector bases generalize the notion of "atomic" functions in multitask learning (Kumar et al., 2012), where each task parameter vector is decomposed as $w_t = B s_t$ with $B$ the basis matrix and $\{b_j\}_{j=1}^M$ 0 a sparse coefficient vector. Task vector bases unify heterogeneous forms—parameter, hidden, or logit space—enabling their use in compressed arithmetic, efficient model merging, and steerable representations (Li et al., 13 Apr 2026, Luo et al., 2024).

The dimension $\{b_j\}_{j=1}^M$ 1 of the basis captures the intrinsic rank of the family of task transformations: $\{b_j\}_{j=1}^M$ 2 equals the number of linearly independent directions required to reconstruct all task effects up to an acceptable error. In compression settings, $\{b_j\}_{j=1}^M$ 3 attests to substantial redundancy or shared structure among tasks (Zeng et al., 3 Feb 2025). In vision-LLMs or in-context learning, the span of task vectors is low-dimensional relative to the vast ambient space (Luo et al., 2024, Dong et al., 10 Jun 2025).

2. Basis Construction Algorithms and Structural Constraints

Multiple algorithms and principled criteria have emerged for constructing task vector bases:

Greedy selection: Iteratively select task vectors maximizing explained variance, updating the basis and weights (Zeng et al., 3 Feb 2025).
SVD/PCA: Compute a low-rank approximation by projecting onto the top singular/vectors (principal components) of the task matrix. Yields optimal mean-squared error but may violate desiderata like nonnegativity (Zeng et al., 3 Feb 2025).
Autoencoding (structural bases): Use a softmax-activated encoder and linear decoder trained to minimize reconstruction error on the Gram matrix $\{b_j\}_{j=1}^M$ 4, imposing constraints such as nonnegativity or block-sparsity for interpretability and efficiency (Zeng et al., 3 Feb 2025).
Block-sparsity and low-rank per layer: In neural networks, especially for parameter-space task vectors, imposing sparsity within layers or factorizing updates per layer further reduces memory and computational footprint (Zeng et al., 3 Feb 2025).
REINFORCE-based selection in activation space: In vision models, mean activations per task position are pruned via REINFORCE to yield a sparse, task-specific sub-basis guiding inference (Hojel et al., 2024).
Linear regression for distributional alignment: In in-context learning, optimal task vector offsets in hidden or logit space are found by ridge regression minimizing the distributional discrepancy to full ICL (Kwon et al., 20 May 2026).

The construction algorithm directly determines recovery accuracy, basis interpretability, and the ability to support advanced operations such as unlearning (negation), OOD generalization, or extrapolation in compressed multitask settings (Zeng et al., 3 Feb 2025, Yan et al., 5 May 2026). Theoretical analyses show that error bounds for downstream risk or unlearning depend on the (M+1)-th eigenvalue of the task Gramian, offering guarantees as $\{b_j\}_{j=1}^M$ 5 increases (Zeng et al., 3 Feb 2025).

3. Functional Roles and Operations Enabled by Task Vector Bases

Task vector bases are central to several functional paradigms across learning and inference:

Scalable model merging: Summing or weighting basis atoms yields new, merged models supporting multiple tasks with drastically reduced storage and computational cost (Zeng et al., 3 Feb 2025, Kim et al., 10 Mar 2025).
Task arithmetic: Vector arithmetic—addition, negation, or extrapolation—on task representations is preserved in the basis, with operations such as

$\{b_j\}_{j=1}^M$ 6

for addition, and analogously for subtraction (negation/unlearning) (Zeng et al., 3 Feb 2025).

Automatic grouping and overlap detection: Structures in the coefficient matrix (sparsity, shared supports) reveal latent task clusters and overlaps, formalizing group structure in multi-task learning (Kumar et al., 2012).
Efficient in-context learning: In transformer ICL, task vectors distilled from demonstrations can be stored and rapidly injected, saving context length and supporting cross-modal or cross-scale transfer (Luo et al., 2024, Dong et al., 10 Jun 2025, Kwon et al., 20 May 2026).
Steering and control in decoding space: Task vectors in logit (decoding) space provide a universal, non-invasive steering mechanism for LLMs, achieving accuracy gains at zero training or parameter updates (Li et al., 13 Apr 2026).
Principled unlearning: For bases constructed via autoencoding or greedy selection, "forgetting" a task corresponds to subtracting its reconstructed vector, with precise error guarantees (Zeng et al., 3 Feb 2025).

A summary table contrasts leading task vector basis methodologies:

Method	Basis Construction	Span Type	Key Operations Supported
SVD/PCA	Top singular vectors	Orthogonal/Low-rank	Addition, but not nonnegativity
Greedy selection	Maximize coverage	Sparse/dense	Addition, negation, interpretability
Autoencoding	Softmax-activated AE	Block-sparse	Addition, negation, compression
REINFORCE selection	Stochastic subsetting	Highly sparse	Zero-shot guidance for visual tasks
Ridge regression (LTV)	Closed-form linear	Hidden/logit space	Distribution-aligned inference

4. Underlying Geometry and Rank Constraints

The geometry of task vector bases tightly links to the expressivity limits of the underlying model class and sampling procedure:

Linear independence and rank limitations: In linear models or in-context learning, demonstration activations span a space of dimension at most $\{b_j\}_{j=1}^M$ 7 (the number of demonstrations). Injecting a single vector yields a rank-one predictor; multiple vectors are needed for higher-rank mappings (Dong et al., 10 Jun 2025).
Convex polytopes and subspace partitioning: In synthetic mixtures, task vectors for $\{b_j\}_{j=1}^M$ 8 tasks span a $\{b_j\}_{j=1}^M$ 9-dimensional subspace $M < T$ 0, with in-distribution inference operating as convex combinations in $M < T$ 1 and OOD generalization occurring in the near-orthogonal complement $M < T$ 2 (Yan et al., 5 May 2026).
Cross-modal alignment: In VLMs, task vectors derived from image and text inputs for the same task are nearly colinear (cosine $M < T$ 3– $M < T$ 4), showing that common functional axes are present across modalities (Luo et al., 2024).
Pseudoclosure and algebraic structure: In the context of square matrices, special bases such as the Weyl–Heisenberg/Pauli and Hadamard bases are characterized by pseudo-closure (e.g., n-pseudo-closure for the Fourier basis, 2-pseudo-closure for Hadamard) under multiplication, supporting group-theoretic and transform applications (Gnang, 2012).

5. Empirical Evidence and Benchmark Performance

Empirical studies comprehensively validate the utility of task vector bases:

Model merging and multitask addition: In vision (ViT) and language (RoBERTa, Llama) settings, using $M < T$ 5 bases via autoencoding or greedy selection recovers $M < T$ 6– $M < T$ 7 of full task arithmetic performance; even $M < T$ 8 retains $M < T$ 9 (Zeng et al., 3 Feb 2025).
Memory and latency: Task vector quantization and residual decomposition maintain accuracy within $\tau_i$ 0 point of FP32 merging at $\tau_i$ 1 effective bits per task (a $\tau_i$ 2 reduction in storage) (Kim et al., 10 Mar 2025). Latency for LTV matches zero-shot inference (Kwon et al., 20 May 2026).
Out-of-distribution generalization: Task vector prompting loss and basis geometry interventions yield marked improvements in OOD/robustness (Yang et al., 16 Jan 2025, Yan et al., 5 May 2026).
Cross-modal and cross-model transfer: In VLMs, patching in a task vector from one modality or even a different base model yields equivalent or superior performance; in LLMs, extracting LTV from a larger model and transferring to a smaller model provides a mean accuracy boost of $\tau_i$ 3 points (Luo et al., 2024, Kwon et al., 20 May 2026).
Logit-space steering: Decoding-space task vectors (DeCoVec) outperform few-shot ICL on TruthfulQA, Math-500, and AQUA-RAT by up to $\tau_i$ 4 points across open-source LLMs (Li et al., 13 Apr 2026).

6. Theoretical Guarantees and Limitations

Task vector bases are governed by clear theoretical constraints and tradeoffs:

Generalization bounds: As long as basis atoms approximate the full set of task vectors to within a small residual (controlled by the top $\tau_i$ 5 singular value), addition and negation error bounds match those derived for the full (uncompressed) arithmetic (Zeng et al., 3 Feb 2025).
Identifiability: Under incoherence and sparsity assumptions, sparse-basis MTL (GO-MTL) recovers both support patterns and subspaces representing task groups, generalizing trace-norm and disjoint-group MTL (Kumar et al., 2012).
Rank-one bottleneck: In transformer ICL, a single task vector can only encode rank-one maps; multi-vector injection is required for complex (bijective) tasks (Dong et al., 10 Jun 2025).
Distributional alignment: Minimizing the KL-divergence or hidden-state MSE between task-vector and ICL distributions (as in LTV) offers a principled extraction criterion, explaining accuracy gains and providing actionable alignment metrics ( $\tau_i$ 6) (Kwon et al., 20 May 2026).
Limitations: Full orthonormal basis construction, basis arithmetic (addition/multiplication), and span analyses for large task families remain ongoing research areas, especially in the context of activation/logit subspaces and multi-modal integration (Luo et al., 2024, Li et al., 13 Apr 2026).

7. Applications Across Modalities and Model Architectures

Task vector bases are a unifying abstraction deployed in varied contexts:

Multitask and continual learning: GO-MTL and task arithmetic with compressed bases enable scalable learning and unlearning over hundreds of tasks, with controlled memory and interference (Zeng et al., 3 Feb 2025, Kumar et al., 2012).
Vision-language and multimodal models: Shared subspaces of task vectors facilitate robust cross-modal transfer, patching, and cross-architecture adaptation without additional fine-tuning (Luo et al., 2024).
In-context learning acceleration: Task vectors distilled from context demonstrations support amortized, minimal-latency inference, robust to modal shifts and model size changes (Yang et al., 16 Jan 2025, Kwon et al., 20 May 2026).
Visual prompting and image manipulation: Activation-space bases, pruned for sparsity and high “taskness,” allow efficient task steering in MAE–VQGAN and related architectures (Hojel et al., 2024).
Quantum computation and algebraic signal processing: Orthogonal bases with pseudo-closure generalize Pauli and Hadamard matrices, supporting matrix Fourier transforms, group theory, and stabilizer code construction in $\tau_i$ 7 (Gnang, 2012).

In sum, task vector bases constitute the mathematical and algorithmic infrastructure for scalable, interpretable, and efficient manipulation of task-induced transformations across modern machine learning systems. Their principled construction, theoretical guarantees, and demonstrated empirical impact make them foundational in multitask model design, compression, and analysis (Zeng et al., 3 Feb 2025, Kumar et al., 2012, Kim et al., 10 Mar 2025, Kwon et al., 20 May 2026, Luo et al., 2024, Dong et al., 10 Jun 2025, Li et al., 13 Apr 2026, Yan et al., 5 May 2026, Hojel et al., 2024, Gnang, 2012).