Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration

Published 3 Apr 2026 in cs.LG, cs.AI, math.NA, and stat.ML | (2604.02659v1)

Abstract: The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular value decomposition (SVD) provides a principled approach for model reduction, but its exact computation is expensive for large weight matrices. Randomized alternatives such as randomized SVD (RSVD) improve efficiency, yet they can suffer from poor approximation quality when the singular value spectrum decays slowly, a regime commonly observed in modern pretrained models. In this work, we address this limitation from both theoretical and empirical perspectives. First, we establish a connection between low-rank approximation error and predictive performance by analyzing softmax perturbations, showing that deviations in class probabilities are controlled by the spectral error of the compressed weights. Second, we demonstrate that RSVD is inadequate, and we propose randomized subspace iteration (RSI) as a more effective alternative. By incorporating multiple power iterations, RSI improves spectral separation and provides a controllable mechanism for enhancing approximation quality. We evaluate our approach on both convolutional networks and transformer-based architectures. Our results show that RSI achieves near-optimal approximation quality while outperforming RSVD in predictive accuracy under aggressive compression, enabling efficient model compression.

Abstract PDF Upgrade to Chat

Authors (1)

Farhad Pourkamali-Anaraki

Summary

The paper presents a novel RSI method to achieve near-optimal low-rank approximations in pretrained models, linking spectral error directly to prediction reliability.
The RSI algorithm improves upon RSVD by using multiple power iterations, drastically reducing spectral norm error and offering significant computational speedups.
Empirical results on VGG19 and ViT-B/32 show that RSI enables aggressive compression with minimal accuracy loss, making it ideal for resource-constrained deployments.

Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration

Overview and Motivation

The exponential growth in the scale of pretrained neural network architectures, spanning hundreds of millions to billions of parameters, has amplified the necessity for effective compression methodologies. Deployment on resource-constrained platforms—such as mobile and edge devices—demands significant reduction in storage and memory footprints without compromising predictive utility. While quantization, pruning, and knowledge distillation have been extensively explored, low-rank matrix decomposition provides a theoretically grounded paradigm for reducing the parameterization of weight matrices, particularly in the linear layers ubiquitous across both convolutional and transformer-based models.

The canonical tool for such decompositions, the truncated singular value decomposition (SVD), offers provable optimality in spectral norm approximation but incurs prohibitive computational costs for large-scale models. Randomized variants, especially randomized SVD (RSVD), present compelling efficiency gains but critically underperform in the presence of slowly decaying singular value spectra—a regime frequently encountered in pretrained neural networks. This paper advances the state of low-rank model compression by: (1) presenting a rigorous theoretical analysis connecting spectral approximation error to probabilistic classification outputs, and (2) introducing and empirically validating randomized subspace iteration (RSI) as a superior alternative to RSVD under challenging spectral conditions.

Theoretical Contributions

A central theoretical contribution is the precise quantification of the effect of low-rank approximation on softmax-based output probabilities. The authors prove that, for a fixed input feature extractor $h(x)$ bounded in norm, the deviation in class probabilities induced by replacing pretrained weight matrix $W$ with its low-rank approximation $\tilde{W}$ is bounded above by the product of this norm bound and the spectral norm error $\|W - \tilde{W}\|_2$ . Explicitly, for all classes $c$ and all input $x$ ,

$|p_c(x) - \tilde{p}_c(x)| \leq R \|W - \tilde{W}\|_2$

where $R$ bounds $\|h(x)\|_2$ . This result formalizes the intuition that spectral approximation error directly controls prediction perturbation, providing a mathematical foundation for reliability assessments in compressed models.

Additionally, leveraging recent advances in the analysis of randomized matrix algorithms, the authors demonstrate that RSI yields exponentially decaying expected spectral error as a function of the number of iteration steps (power iterations), with convergence to the optimal error achievable by exact SVD. This is contrasted with standard RSVD, whose accuracy stagnates in the slow-decay regime unless the target rank is large, diminishing its parameter reduction utility.

Algorithmic Framework: Randomized Subspace Iteration

While RSVD (essentially equivalent to RSI with $q=1$ power iteration) captures leading singular directions reasonably well when the singular spectrum decays rapidly, it is insufficient when the spectrum is flat, as is typical in deep pretrained models. RSI generalizes RSVD by introducing $W$ 0 power iterations, effectively amplifying dominant singular values and suppressing the influence of trailing components.

The algorithmic procedure is as follows:

Draw a random matrix to probe the row space.
Alternate between $W$ 1 applications of $W$ 2 and $W$ 3, with intermediate orthonormalization.
Compute the SVD on the reduced subspace, reconstructing the rank- $W$ 4 approximation of $W$ 5.

The resulting computation scales favorably with the matrix dimensions and is highly parallelizable, yielding substantial practical acceleration on GPU platforms.

Empirical Results

Experiments span both a canonical convolutional architecture (VGG19) and a vision transformer (ViT-B/32), focusing on single-layer low-rank decompositions and complete end-to-end compression. Key findings include:

Single-Layer Approximation: Even minimal increases in power iteration (e.g., RSI with $W$ 6) substantially improve normalized spectral error, reducing it from approximately $W$ 7 (RSVD) toward the SVD optimality baseline ( $W$ 8), especially in high-dimensional layers with slow spectral decay.
Computational Efficiency: RSI achieves up to $W$ 9 speedup relative to exact SVD for large linear layers, with further gains as $\tilde{W}$ 0 decreases. Even for moderate-sized transformer layers, RSI preserves this advantage.
End-to-End Model Compression: Compressing all linear layers with RSI can achieve compression ratios as low as $\tilde{W}$ 1 for VGG19 and $\tilde{W}$ 2 for ViT-B/32, with minimal drop in Top-1 and Top-5 accuracy provided an adequate iteration count ( $\tilde{W}$ 3). For aggressive compression (target rank parameter $\tilde{W}$ 4), RSI outperforms RSVD in classification accuracy by a substantial margin, demonstrating its robustness in real deployment scenarios.

No retraining or fine-tuning is conducted, reflecting the raw impact of linear compression alone on model performance.

Implications and Future Directions

The theoretical and empirical results establish RSI as a practical and principled tool for compressing large pretrained neural networks, particularly within the regime where singular value spectra do not admit sharp decay. This is likely to be the prevalent scenario in state-of-the-art vision and LLMs. The explicit spectral perturbation bounds for classification tasks provide a template for formal certification of compressed model reliability. Moreover, RSI is naturally composable with methods such as low-rank adaptation (e.g., LoRA), suggesting hybrid strategies where backbone compression and parameter-efficient adaptation are jointly applied for maximal memory and compute savings.

Further research avenues include adaptive, layer-wise selection of target ranks (potentially informed by singular value decay profiles), extension to multi-modal architectures, and integration with quantization or structured pruning for compound compression. Additionally, exploring the deployment of RSI-compressed models within federated and privacy-preserving learning frameworks could address emerging challenges in distributed AI applications.

Conclusion

This work rigorously demonstrates that randomized subspace iteration (RSI) can achieve near-optimal low-rank approximations for the linear components of large pretrained models, even under adversarially slow singular value decay. By explicitly connecting spectral approximation error to prediction confidence and validating the practical efficacy of RSI on deep convolutional and transformer models, the study positions RSI as an essential component of the contemporary model compression toolkit. Its efficiency, provable reliability, and adaptability to diverse architectures and compression regimes enable significant advances in the scalable deployment of pretrained neural networks.

Markdown Report Issue