TSV-Compress: Efficient Model Compression
- TSV-Compress is a model compression technique that leverages truncated SVD on layer-task matrices to reduce storage by up to 90% while preserving at least 99% of original task accuracy.
- The method approximates weight differences between fine-tuned and pretrained models using a rank-truncated SVD, ensuring 99% energy retention with minimal singular values.
- Empirical results on ViT-B-32 demonstrate that TSV-C achieves near-original accuracy across multiple tasks with significant reductions in per-task storage needs, facilitating efficient model merging.
TSV-Compress (TSV-C) is a model compression technique designed to reduce the storage and computational requirements of per-task fine-tuned neural network weights, while preserving accuracy. TSV-C leverages the observed low-rank structure of "layer-task matrices," compressing them to approximately 10% of their original size with minimal accuracy degradation, typically retaining at least 99% of original task performance. TSV-C is integral to pipelines such as model merging, where per-task parameter changes need to be efficiently represented and combined (Gargiulo et al., 2024).
1. Definition and Mathematical Formulation
For a pretrained model backbone and a given downstream task , let be the pretrained weights and the fine-tuned weights. At a specific layer (where the weights are naturally a matrix, such as those in fully-connected or convolutional layers), the layer-task matrix is defined as
where . For layers not natively represented as matrices, TSV-C defaults to ordinary Task Arithmetic without compression (Gargiulo et al., 2024).
2. Algorithmic Procedure and SVD-Based Compression
The core of TSV-Compress is the use of truncated singular value decomposition (SVD) to approximate each :
with , , 0, 1. To produce a compressed approximation, TSV-C forms a rank-2 truncated version:
3
where 4 is the minimal value such that
5
ensuring at least 99% Frobenius norm (energy) retention. Simultaneously, TSV-C enforces 6, so at most 10% of singular components are preserved, capping storage usage at 10% of the original per-layer parameters (Gargiulo et al., 2024).
3. Implementation Details
The TSV-Compress procedure is as follows (summarized in pseudocode):
3
At inference, the compressed 7 is reconstructed for each layer as
8
and the modified layer weights are
9
with 0 by default (Gargiulo et al., 2024).
Storage per layer is reduced from 1 parameters to 2, which, given 3, guarantees storage cost is at most 4 of the original for each eligible layer.
4. Empirical Compression Performance
Empirical results on the ViT-B-32 architecture are summarized as follows:
| Method | 8 tasks | 14 tasks | 20 tasks |
|---|---|---|---|
| Finetuned (100%) | 92.83 (100) | 90.88 (100) | 91.37 (100) |
| TALL-Mask + TIES | 93.13 (100.4) | 90.92 (100) | 91.11 (99.7) |
| TSV-C (Ours) | 92.62 (99.7) | 90.29 (99.3) | 90.64 (99.1) |
Subscripts represent normalized accuracy (percentage of original). For all scenarios, TSV-C uses approximately 10% of per-task parameter storage, retaining at least 99% of original accuracy (Gargiulo et al., 2024).
5. Computational Complexity and Practical Considerations
- SVD Complexity: The standard per-layer SVD operation has complexity 5, but can be accelerated to 6 with randomized SVD methods, which are suitable due to the small retained rank 7.
- Storage: For a weight matrix of size 8, storing decomposed forms at rank 9 requires
0
parameters per layer, maximizing at 1 for 2.
- Implementation Tips:
- Batch SVD computations across layers or tasks using GPU libraries, such as
torch.linalg.svd. - Employ randomized or truncated SVD to manage compute cost for large matrices.
- Store singular factors contiguously as three tensors per layer for efficient reconstruction.
- Preallocate and reuse workspace buffers to minimize GPU memory fragmentation.
- For multi-task scenarios with a shared backbone, cache the SVD of the pretrained weights to reduce computation (Gargiulo et al., 2024).
- Batch SVD computations across layers or tasks using GPU libraries, such as
6. Integration and Use in Model Merging
TSV-Compress is designed for seamless integration into model merging pipelines, where low-rank, compressed task-specific deltas can be reconstructed on demand and combined with the pretrained backbone. This is particularly valuable in settings where many tasks share a backbone, as only the compressed singular factors (Task Singular Vectors) for each delta need to be stored and transmitted. The existence of low-rank layer-task matrices also supports improved model merging methods, such as TSV-Merge, which leverage singular vector interactions to reduce task interference (Gargiulo et al., 2024).
7. Comparative Context and Applications
TSV-Compress offers a compact and accurate alternative to existing compression and masking schemes such as TALL-Mask + TIES, maintaining competitive or superior accuracy with an explicit mathematical guarantee on energy retention and storage footprint. Its applicability is circumscribed to layers naturally represented by matrices (e.g., fully-connected, convolutional structures), with ordinary task arithmetic fallback otherwise. The demonstrated results substantiate its utility for scalable multi-task adaptation, federated settings, and efficient model deployment (Gargiulo et al., 2024).