Papers
Topics
Authors
Recent
Search
2000 character limit reached

TSV-Compress: Efficient Model Compression

Updated 28 April 2026
  • TSV-Compress is a model compression technique that leverages truncated SVD on layer-task matrices to reduce storage by up to 90% while preserving at least 99% of original task accuracy.
  • The method approximates weight differences between fine-tuned and pretrained models using a rank-truncated SVD, ensuring 99% energy retention with minimal singular values.
  • Empirical results on ViT-B-32 demonstrate that TSV-C achieves near-original accuracy across multiple tasks with significant reductions in per-task storage needs, facilitating efficient model merging.

TSV-Compress (TSV-C) is a model compression technique designed to reduce the storage and computational requirements of per-task fine-tuned neural network weights, while preserving accuracy. TSV-C leverages the observed low-rank structure of "layer-task matrices," compressing them to approximately 10% of their original size with minimal accuracy degradation, typically retaining at least 99% of original task performance. TSV-C is integral to pipelines such as model merging, where per-task parameter changes need to be efficiently represented and combined (Gargiulo et al., 2024).

1. Definition and Mathematical Formulation

For a pretrained model backbone and a given downstream task τ\tau, let θpre\theta_{pre} be the pretrained weights and θft(τ)\theta_{ft}(\tau) the fine-tuned weights. At a specific layer tt (where the weights are naturally a matrix, such as those in fully-connected or convolutional layers), the layer-task matrix is defined as

Mτt≡Δτt=θft(τ)t−θpret,M_\tau^t \equiv \Delta_\tau^t = \theta_{ft}(\tau)^t - \theta_{pre}^t,

where Mτt∈Rm×nM_\tau^t \in \mathbb{R}^{m \times n}. For layers not natively represented as matrices, TSV-C defaults to ordinary Task Arithmetic without compression (Gargiulo et al., 2024).

2. Algorithmic Procedure and SVD-Based Compression

The core of TSV-Compress is the use of truncated singular value decomposition (SVD) to approximate each MτtM_\tau^t:

Mτt=UτtΣτt(Vτt)T,M_\tau^t = U_\tau^t \Sigma_\tau^t (V_\tau^t)^T,

with Uτt∈Rm×kU_\tau^t \in \mathbb{R}^{m \times k}, Στt∈Rk×k\Sigma_\tau^t \in \mathbb{R}^{k \times k}, θpre\theta_{pre}0, θpre\theta_{pre}1. To produce a compressed approximation, TSV-C forms a rank-θpre\theta_{pre}2 truncated version:

θpre\theta_{pre}3

where θpre\theta_{pre}4 is the minimal value such that

θpre\theta_{pre}5

ensuring at least 99% Frobenius norm (energy) retention. Simultaneously, TSV-C enforces θpre\theta_{pre}6, so at most 10% of singular components are preserved, capping storage usage at 10% of the original per-layer parameters (Gargiulo et al., 2024).

3. Implementation Details

The TSV-Compress procedure is as follows (summarized in pseudocode):

tt3

At inference, the compressed θpre\theta_{pre}7 is reconstructed for each layer as

θpre\theta_{pre}8

and the modified layer weights are

θpre\theta_{pre}9

with θft(τ)\theta_{ft}(\tau)0 by default (Gargiulo et al., 2024).

Storage per layer is reduced from θft(τ)\theta_{ft}(\tau)1 parameters to θft(τ)\theta_{ft}(\tau)2, which, given θft(τ)\theta_{ft}(\tau)3, guarantees storage cost is at most θft(τ)\theta_{ft}(\tau)4 of the original for each eligible layer.

4. Empirical Compression Performance

Empirical results on the ViT-B-32 architecture are summarized as follows:

Method 8 tasks 14 tasks 20 tasks
Finetuned (100%) 92.83 (100) 90.88 (100) 91.37 (100)
TALL-Mask + TIES 93.13 (100.4) 90.92 (100) 91.11 (99.7)
TSV-C (Ours) 92.62 (99.7) 90.29 (99.3) 90.64 (99.1)

Subscripts represent normalized accuracy (percentage of original). For all scenarios, TSV-C uses approximately 10% of per-task parameter storage, retaining at least 99% of original accuracy (Gargiulo et al., 2024).

5. Computational Complexity and Practical Considerations

  • SVD Complexity: The standard per-layer SVD operation has complexity θft(Ï„)\theta_{ft}(\tau)5, but can be accelerated to θft(Ï„)\theta_{ft}(\tau)6 with randomized SVD methods, which are suitable due to the small retained rank θft(Ï„)\theta_{ft}(\tau)7.
  • Storage: For a weight matrix of size θft(Ï„)\theta_{ft}(\tau)8, storing decomposed forms at rank θft(Ï„)\theta_{ft}(\tau)9 requires

tt0

parameters per layer, maximizing at tt1 for tt2.

  • Implementation Tips:
    • Batch SVD computations across layers or tasks using GPU libraries, such as torch.linalg.svd.
    • Employ randomized or truncated SVD to manage compute cost for large matrices.
    • Store singular factors contiguously as three tensors per layer for efficient reconstruction.
    • Preallocate and reuse workspace buffers to minimize GPU memory fragmentation.
    • For multi-task scenarios with a shared backbone, cache the SVD of the pretrained weights to reduce computation (Gargiulo et al., 2024).

6. Integration and Use in Model Merging

TSV-Compress is designed for seamless integration into model merging pipelines, where low-rank, compressed task-specific deltas can be reconstructed on demand and combined with the pretrained backbone. This is particularly valuable in settings where many tasks share a backbone, as only the compressed singular factors (Task Singular Vectors) for each delta need to be stored and transmitted. The existence of low-rank layer-task matrices also supports improved model merging methods, such as TSV-Merge, which leverage singular vector interactions to reduce task interference (Gargiulo et al., 2024).

7. Comparative Context and Applications

TSV-Compress offers a compact and accurate alternative to existing compression and masking schemes such as TALL-Mask + TIES, maintaining competitive or superior accuracy with an explicit mathematical guarantee on energy retention and storage footprint. Its applicability is circumscribed to layers naturally represented by matrices (e.g., fully-connected, convolutional structures), with ordinary task arithmetic fallback otherwise. The demonstrated results substantiate its utility for scalable multi-task adaptation, federated settings, and efficient model deployment (Gargiulo et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TSV-Compress (TSV-C).