Papers
Topics
Authors
Recent
Search
2000 character limit reached

Residual Task Vector Quantization (RTVQ)

Updated 15 January 2026
  • RTVQ is a memory-efficient multi-task learning method that decomposes task vectors into a shared base and per-task residuals for precise quantization.
  • It employs asymmetric affine quantization to compress the narrow-range differences in task vectors, significantly reducing storage while controlling error.
  • Empirical results demonstrate that RTVQ achieves up to 92% storage reduction with negligible or improved performance on benchmarks like ViT and ResNet.

Residual Task Vector Quantization (RTVQ) is a method designed for highly memory-efficient model merging in multi-task learning frameworks. It addresses scalability limits imposed by traditional storage of multiple fine-tuned checkpoints by decomposing and quantizing task vector differences with high precision per bit. The technique leverages the statistically narrow range of task vectors to sustain or improve downstream performance while offering substantial reductions in storage requirements (Kim et al., 10 Mar 2025).

1. Formal Definition of Task Vectors and Narrow Range

Given a pre-trained model with parameters θpreRn\theta_{\mathrm{pre}} \in \mathbb{R}^n, and a collection of fine-tuned checkpoints {θftt}t=1T\{\theta^{t}_{\mathrm{ft}}\}_{t=1}^T for tasks t=1,,Tt=1, \ldots, T, the task vector for task tt is defined as

τt=θfttθpre.\tau_t = \theta^{t}_{\mathrm{ft}} - \theta_{\mathrm{pre}}.

Empirical analysis demonstrates that the dynamic range of τt\tau_t is approximately an order of magnitude narrower than the range of θftt\theta^{t}_{\mathrm{ft}} itself (Figure 1, (Kim et al., 10 Mar 2025)). In the standard asymmetric quantization protocol, this property bounds the per-element rounding error

ϵΔ/2,Δ=θmaxθmin2b1,| \epsilon | \leq \Delta / 2, \quad \Delta = \frac{\theta_{\max} - \theta_{\min}}{2^b-1},

where the smaller range of τt\tau_t ensures significantly reduced quantization noise for bitwidth bb. This holds for all model types explored, including vision transformers (ViT-B/32, ViT-L/14) and convolutional nets (ResNet-50).

2. RTVQ Algorithmic Workflow

RTVQ quantizes each task vector in two parts: (1) a shared base vector, and (2) a per-task residual ("offset" vector). The process is:

  1. Compute Task-Average Weight:

θavg=1Tt=1Tθftt\theta_{\mathrm{avg}} = \frac{1}{T} \sum_{t=1}^T \theta^{t}_{\mathrm{ft}}

  1. Form Base Vector:

τbase=θavgθpre\tau_{\mathrm{base}} = \theta_{\mathrm{avg}} - \theta_{\mathrm{pre}}

  1. Quantize Base Vector (to bbb_b bits):

τbaseq=Q(τbase;bb)\tau_{\mathrm{base}}^q = Q(\tau_{\mathrm{base}}; b_b)

  1. Error Correction:

θavgec=θpre+τbaseq\theta_{\mathrm{avg}}^{\mathrm{ec}} = \theta_{\mathrm{pre}} + \tau_{\mathrm{base}}^q

  1. Per-Task Offset Vector:

τofft=θfttθavgec\tau_{\mathrm{off}}^t = \theta^{t}_{\mathrm{ft}} - \theta_{\mathrm{avg}}^{\mathrm{ec}}

  1. Quantize Offsets (to bob_o bits):

τoffq,t=Q(τofft;bo)\tau_{\mathrm{off}}^{q,t} = Q(\tau_{\mathrm{off}}^t; b_o)

  1. Storage and Reconstruction:

Store only (τbaseq,{τoffq,t})(\tau_{\mathrm{base}}^q, \{\tau_{\mathrm{off}}^{q,t}\}) and reconstruct as τ^t=τbaseq+τoffq,t\hat{\tau}_t = \tau_{\mathrm{base}}^q + \tau_{\mathrm{off}}^{q,t} at merge time.

Here, Q(;b)Q(\cdot \,;\, b) denotes bb-bit asymmetric affine quantization.

3. Quantization Principles and Mathematical Formulation

The quantization function is given as follows for any tensor xx: Δx=xmaxxmin2b1, zx=round(xmin/Δx), xq=round(x/Δx)+zx, x^=Δx(xqzx).\begin{align*} \Delta_x &= \frac{x_{\max} - x_{\min}}{2^b-1}, \ z_x &= -\mathrm{round}(x_{\min}/\Delta_x), \ x^q &= \mathrm{round}(x/\Delta_x) + z_x, \ \hat{x} &= \Delta_x \cdot (x^q - z_x). \end{align*} Applying to RTVQ, the base τbase\tau_{\mathrm{base}} is quantized to bbb_b bits, and each τofft\tau_{\mathrm{off}}^t to bob_o bits. The reconstructed quantized task vector

τ^t=τ^base+τ^offt.\hat{\tau}_t = \hat{\tau}_{\mathrm{base}} + \hat{\tau}_{\mathrm{off}}^t.

The per-tensor scale and zero-point require negligible additional storage. The algorithm exploits the high quantization-sensitivity of the base, allocating more bits bbb_b, while offsets—which have extremely compressed range—can be quantized with bob_o as low as 2 bits with minimal impact.

4. Bitwidth Allocation and Memory Budget Adaptation

The total bit requirement for TT tasks is

Total  bits=bb+Tbo.\mathrm{Total \; bits} = b_b + T \cdot b_o.

Given a per-task memory budget BB, the bit allocation satisfies bo+(bb/T)Bb_o + (b_b / T) \approx B. Practical selection involves sweeping over bb{3,4,8}b_b \in \{3,4,8\} and bo{2,3,4}b_o \in \{2,3,4\} to balance quantization error and resource constraints, as measured by downstream performance. The design principle is that the higher-variance base encodes global features influencing all tasks, while low-variance per-task offsets can be compressed more aggressively.

5. Quantization Error Analysis

Under affine quantization, the maximum \ell_\infty componentwise error for any vector xx is

ϵixmaxxmin2(2b1),| \epsilon_i | \leq \frac{x_{\max} - x_{\min}}{2 \cdot (2^b-1)},

and the mean squared (2\ell_2) error scales with nΔx/2\sqrt{n} \Delta_x / 2. For RTVQ,

Qerr(τt,bt)Qerr(τbase,bb)+Qerr(τofft,bo)Q_{\mathrm{err}}(\tau_t, b_t) \lesssim Q_{\mathrm{err}}(\tau_{\mathrm{base}}, b_b) + Q_{\mathrm{err}}(\tau_{\mathrm{off}}^t, b_o)

by linearity. Empirical results (Fig. 3, (Kim et al., 10 Mar 2025)) show RTVQ reduces overall 2\ell_2 quantization error per bit compared to direct single-stage Task Vector Quantization (TVQ) at ultra-low bitwidths (e.g., 2 bits). RTVQ’s error reduction becomes more pronounced as memory constraints intensify.

6. Empirical Performance and Storage Reduction

RTVQ demonstrates performance that matches or surpasses full-precision (FP32) and standard TVQ baselines while drastically reducing memory. Key results:

  • ViT-B/32, 8 tasks (classification)
    • FP32: 9.1 GB (69.2% accuracy)
    • TVQ (4 bits): 1.1 GB (69.1%)
    • TVQ (2 bits): 62% accuracy
    • RTVQ (bb=3b_b=3, bo=2b_o=2): 0.7 GB (70.2% accuracy, +1% relative to FP32)
  • Scaling to 14/20 tasks (ViT-B/32, ViT-L/14)
    • TVQ degradation shrinks with TT; RTVQ maintains within 1% accuracy of FP32 at \approx2.2 bits/task.
  • ResNet-50 NYUv2 (dense prediction)
    • 4 bit TVQ: Segmentation (mIoU), Depth (RelErr), and Normal (AngErr) within 0.1–0.5% of FP32.
    • 2 bit TVQ: Significant drop (normal error rises from 30.6° to 36°)
    • RTVQ (2+2 bits): Within 2° of FP32.
  • Storage Scaling (ViT-L/14, 20 tasks)
    • FP32: 22.8 GB
    • 4 bit TVQ: 2.9 GB
    • 2 bit TVQ: 1.4 GB
    • RTVQ (3+2 bits): 1.7 GB (\approx7.5% of FP32)

These results support that memory can be compressed to less than 8% of the original footprint with negligible or even improved merging performance (Kim et al., 10 Mar 2025).

7. Practical Implementation Considerations and Hyperparameters

RTVQ requires strict alignment of the pre-trained backbone θpre\theta_{\mathrm{pre}} at both training and merge time; only task vectors are quantized. The quantization protocol employs per-tensor asymmetric scaling and zero-points. The error-correction step—adding θpre\theta_{\mathrm{pre}} to the quantized base before computing offsets—is crucial when bbb_b is low, mitigating drift in reconstructions.

Recommended hyperparameters are bb=3b_b = 3, bo=2b_o = 2 as a default, with sweeps over bbb_b and bob_o for specific accuracy/memory trade-offs. The merging frameworks (Task Arithmetic, Ties, EMR, AdaMerging) operate unchanged, substituting quantized task vectors for their full-precision counterparts. For sensitivity tuning, the 2\ell_2 norm of quantization error, averaged across layers or tasks, is the metric of choice.

RTVQ capitalizes on the inherent statistical structure of task vector spaces, delivering storage reductions of up to 92%\approx 92\% without accuracy loss on both classification and dense-prediction benchmarks, and establishes a new benchmark for scalable, memory-efficient model merging (Kim et al., 10 Mar 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Residual Task Vector Quantization (RTVQ).