Unified Task Vector in Multi-Task Learning

Updated 7 December 2025

Unified Task Vector is a mathematical construct representing the difference between pre-trained and task-adapted model parameters to unify task-specific adaptations.
It employs quantization, dynamic fusion, and disentanglement techniques to aggregate diverse task vectors, reducing memory and communication demands.
Applications include multi-task image classification, 3D perception, and federated learning where efficient model merging improves performance while lowering resource usage.

A unified task vector is a mathematical and algorithmic construct designed to encapsulate the principal adaptations or representations associated with distinct tasks within a single, shared parameter space. This concept underlies a diverse array of advances in multi-task learning, model merging, efficient communication, and federated optimization, where tasks may differ in data modality, label space, or client deployment. Unified task vector methodologies generalize the "task arithmetic" paradigm—operating on the difference between pre-trained and task-adapted model parameters or prompt vectors—while introducing mechanisms for fusion, quantization, and feature disentanglement that facilitate scalable, memory- and communication-efficient collaboration among tasks.

1. Definitions and Foundational Formulation

Classically, for a model with pre-trained weights $\theta_\mathrm{pre}$ , each task $t$ with fine-tuned weights $\theta^*_{t}$ gives rise to a task vector: $\delta_t = \theta^*_{t} - \theta_\mathrm{pre}$ Within the context of model merging and continued optimization, these vectors serve as atomic directions encoding task-specific adaptations. A unified task vector aggregates multiple such $\delta_t$ across tasks, producing a representation $u$ (notation varies by framework) intended to support model consolidation, parameter efficiency, or multi-task inference.

In federated and many-task regimes, client $n$ handling tasks $T_n = \{t_1, \dots, t_{k_n}\}$ constructs a unified task vector $u_n$ by selecting dominant sign patterns and maximum magnitudes: $\sigma_n = \mathrm{sgn}\left(\sum_{t\in T_n} \tau^t_n\right), \quad \mu_n(j) = \max\{ |\tau^t_n(j)| : \mathrm{sgn}(\tau^t_n(j)) = \sigma_n(j) \}, \quad u_n = \sigma_n \odot \mu_n$ where $\tau^t_n$ is the task vector for $t$ at client $n$ and $\odot$ denotes element-wise multiplication (Tsouvalas et al., 10 Feb 2025).

Unified task vectors have also been formulated as additive merges in prompt-tuning frameworks and vector-field constructs in multi-task perception for vision and robotics.

2. Representation Construction and Aggregation Techniques

Quantization and Memory Efficiency

Task vector quantization (TVQ) leverages the empirically narrow dynamic range of $\delta_t$ compared to full $\theta^*_{t}$ , allowing aggressive quantization (3–4 bits) with negligible loss. The memory requirement for storing $T$ tasks drops dramatically—e.g., with ViT-L/14 checkpoints, from $1.14\,\mathrm{GB} \times T$ (FP32) to $0.7\,\mathrm{GB}$ for $T=8$ tasks using Residual Task Vector Quantization (RTVQ) (Kim et al., 10 Mar 2025). RTVQ further decomposes $\delta_t$ : $\delta_t = \delta_\mathrm{base} + \delta_t^\mathrm{off}$ Here, $\delta_\mathrm{base}$ (the mean direction) is stored in higher precision, and per-task offsets in lower precision, achieving effective per-task bit-widths $\approx 2.4$ .

Multi-Task and Federated Aggregation

In federated learning, unified task vectors allow communication-efficient model fusion. Clients compute $u_n$ as above, and the server performs cross-client and cross-task aggregation:

Cosine and sign-alignment metrics facilitate similarity-aware fusion.
Global task vectors are constructed via weighted aggregation and sign-based regularization to encourage disentanglement.
Clients recover per-task personalization through lightweight mask and scaling modulators:

$m_n^t = (\tau_n^t \odot u_n > 0), \quad \lambda_n^t = \frac{\sum_j |\tau_n^t(j)|}{\sum_j |[m_n^t \odot u_n]_j|}, \quad \dot\tau_n^t = \lambda_n^t (m_n^t \odot u_n)$

(Tsouvalas et al., 10 Feb 2025).

In prompt tuning for NLP, dynamic task vector grouping (DTVG) defines task vectors as $T_t = P_t^* - P_{\mathrm{init}}$ (with $P_t^*$ the learned prompt for $t$ ), and merges them additively with dynamic, similarity-based weighting and knowledge-consistency constraints, regularly re-optimizing the group for the target task (Zhang et al., 23 Mar 2025).

Unified Vector-Field Representations

In 3D perception, RepVF models all tasks (e.g., object and lane detection) as subfields of a joint vector field defined on partitioned 3D domains. Each vector field $R_i = (S_i, \mathcal F_i(S_i))$ with $S_i$ a set of sampled 3D points and $\mathcal F_i$ encoding semantic or geometric attributes. A single network head predicts all $R_i$ simultaneously, allowing parameter and compute sharing while maintaining task separation via differentiable conversion to specific task labels (Li et al., 15 Jul 2024).

3. Fusion, Disentanglement, and Knowledge Integration

Unified task vectors generally serve as the foundation for fusing task-specific expertise, supporting both positive knowledge transfer and mitigation of negative transfer via explicit disentanglement.

Task Similarity and Selection: Cosine similarity and sign overlap between task vectors provide quantitative bases for selecting related tasks in aggregation and avoiding destructive interference (Tsouvalas et al., 10 Feb 2025, Zhang et al., 23 Mar 2025).
Layer-Wise Fusion and Feature Drift Minimization: Layer-wise Optimal Task Vector Merging formulates the fusion problem for each layer as a convex quadratic program minimizing squared feature drift across tasks, with closed-form solutions for linear and normalization layers:

$(T^l)^\star = \left(\sum_k (X_k^l)^\top X_k^l\right)^\dagger \left(\sum_k (X_k^l)^\top X_k^l T_k^l\right)$

where $X_k^l$ represents the input activations for task $k$ at layer $l$ (Sun et al., 29 May 2025).

Disentanglement Regularization: Federated approaches employ regularization on sign-conflicts between task directions to promote orthogonality and keep task vectors distinct (Tsouvalas et al., 10 Feb 2025).

4. Applications in Multi-Task, Federated, and Multimodal Learning

Model Merging and Multi-Task Inference

TVQ/RTVQ achieves effective model merging for vision and dense prediction, supporting, e.g., ViT-B/32 multi-task image classification at 2.4 bits/task with accuracy equal or superior to FP32 arithmetic (Kim et al., 10 Mar 2025).
LOT Merging yields $\sim$ 4.4 point accuracy improvements over state-of-the-art parameter-level merging by directly minimizing feature drift (Sun et al., 29 May 2025).
DTVG surpasses static prompt-merge, consistently outperforming prior prompt-tuning frameworks across 26 NLP datasets, with best average accuracy for both GLUE and SuperGLUE (Zhang et al., 23 Mar 2025).

Federated and Many-Task Learning

MaTU's unified task vector mechanism supports many-task federated learning (MaT-FL) scenarios without per-client model tracking, scaling up to 30 tasks and achieving performance near per-task fine-tuning at a fraction of the communication cost (Tsouvalas et al., 10 Feb 2025).

Multimodal and Communication-Efficient Systems

U-DeepSC exploits per-task embedding vectors, a vector-wise dynamic scheme for feature selection, and a unified codebook for quantized representation, enabling task- and SNR-adaptive model operation for image, text, and speech data. Model size is reduced to $\sim 28.5\%$ compared to storing six single-task models (Zhang et al., 2022).

Unified Vector-Field Perception

RepVF and the associated RFTR model provide a basis for concurrent 3D detection and lane perception with a shared head, eliminating redundant parameters and improving computational efficiency. In multi-task settings, this approach increases F-score and mean average precision by large margins over multi-head baselines, despite using fewer parameters (Li et al., 15 Jul 2024).

5. Limitations and Open Challenges

Several open questions and limitations have been documented:

Ultra-low bit quantization (<2 bits/task) without accuracy degradation has not been achieved; methods combining quantization-aware training or non-uniform quantization may be required (Kim et al., 10 Mar 2025).
Generalization of unified task vector approaches to LLMs, highly non-linear or domain-specific tasks, or tasks with divergent structure remains an area of active inquiry (Kim et al., 10 Mar 2025, Sun et al., 29 May 2025).
Fully data-free model merging with feature drift minimization is not yet realized; current techniques require small exemplar sets per task (Sun et al., 29 May 2025).
For unified vector field perception, richer geometric attributes and learned sampling set initialization may be necessary to handle small object detection and broad task diversity (Li et al., 15 Jul 2024).
In multi-modal communication, optimal balancing of transmission overhead versus semantic performance, especially under variable channel conditions, remains complex (Zhang et al., 2022).

This table summarizes memory and communication advantages in selected frameworks:

Method	Memory/Comm. Efficiency	Accuracy (Representative Setting)
TVQ/RTVQ (Kim et al., 10 Mar 2025)	7–8% of FP32 with RTVQ (20 tasks)	70.2% merged ViT-B/32; mIoU +4.5 on NYUv2
MaTU Unified TV (Tsouvalas et al., 10 Feb 2025)	%%%%34 $t$ 35%%%% floats (15 tasks)	79.47% (multi-task), 84.32% (single-task, 8 tasks)
U-DeepSC (Zhang et al., 2022)	28.5% of separate models	1–2% within task-specific on benchmarks, reduced O/H
RepVF (Li et al., 15 Jul 2024)	20% fewer params vs baseline	F=66.5%, mAP=25.3 (multi-task 3D perception)

6. Theoretical and Practical Impact

Unified task vectors provide an abstraction supporting scalable, memory- and computation-efficient multi-task systems across modalities and learning architectures. They achieve this by:

Enabling parameter- and communication-efficient management of heterogeneous, high-cardinality task sets.
Facilitating dynamic and similarity-aware knowledge transfer through explicit task vector arithmetic and grouping.
Allowing single-model deployments in settings—federated, multi-modal, and multi-objective—where naive approaches demand infeasible memory or computational resources.

As research continues, the unified task vector formalism is being extended to broader domains, deeper integration strategies, and more dynamic groupings. The field now encompasses vector representations for neural prompt adaptation, federated task arbitration, quantized model merging, vector-field-based perception, and unified communication protocols, each grounded in concise, mathematically formalized operations that enable robust and interpretable multi-task behavior.