Low-Rank Matrix Approximation for Neural Network Compression (2504.20078v2)

Published 25 Apr 2025 in cs.LG and cs.CC

Abstract: Deep Neural Networks (DNNs) have encountered an emerging deployment challenge due to large and expensive memory and computation requirements. In this paper, we present a new Adaptive-Rank Singular Value Decomposition (ARSVD) method that approximates the optimal rank for compressing weight matrices in neural networks using spectral entropy. Unlike conventional SVD-based methods that apply a fixed-rank truncation across all layers, ARSVD uses an adaptive selection of the rank per layer through the entropy distribution of its singular values. This approach ensures that each layer will retain a certain amount of its informational content, thereby reducing redundancy. Our method enables efficient, layer-wise compression, yielding improved performance with reduced space and time complexity compared to static-rank reduction techniques.

Summary

Low-Rank Matrix Approximation for Neural Network Compression

The paper "Low-Rank Matrix Approximation for Neural Network Compression" introduces an innovative approach to enhancing the efficiency of Deep Neural Networks (DNNs) through a technique called adaptive-rank Singular Value Decomposition (ARSVD). Given the high resource consumption associated with DNNs, particularly in terms of memory and computational demands, the necessity for effective model compression strategies is paramount. The authors propose ARSVD as a method that adapts the rank of weight matrices within fully connected layers, governed by the distribution of energy across the neural network, to maintain performance while significantly reducing the model’s size.

Key Contributions

The prominent contribution of this paper is the development and validation of ARSVD, which diverges from conventional fixed-rank SVD compression methods. Instead of applying a uniform rank reduction across all layers, ARSVD dynamically selects rank based on the energy distribution of each layer, thereby optimizing the trade-off between compression and model accuracy. By harnessing energy thresholds, ARSVD ensures minimal accuracy loss, likely outperforming static compression techniques.

Experimental Evaluation

The methodology was tested across multiple datasets, including MNIST, CIFAR-10, and CIFAR-100, using a simple Multi-Layer Perceptron (MLP). The experimental results indicate several notable gains:

Accuracy Enhancement: ARSVD outperformed baseline models, particularly on CIFAR-10 and CIFAR-100 datasets, with substantial accuracy improvements. For CIFAR-10, accuracy increased by 9.18 percentage points, while for CIFAR-100, the increase was 11.27 percentage points. Such figures underscore the method’s capacity to compress models without detrimental effects on classification performance.
F1 Score Improvements: While MNIST saw negligible changes due to dataset simplicity, CIFAR datasets exhibited significant F1 score enhancements post-compression. This reflects the technique’s efficacy in more complex data environments.
Runtime Efficiency: The paper demonstrates a considerable reduction in runtime, attributable to ARSVD’s ability to prune unnecessary parameters and streamline gradient updates. The compression leads to faster inference times and reduced computational overhead.

Implications and Future Directions

The success of ARSVD suggests impactful applications in scenarios constrained by computational and memory resources, such as mobile and embedded systems. This methodology potentially broadens the accessibility of robust DNN solutions across diverse, resource-limited environments.

Future work may involve refining the adaptive rank selection process and testing ARSVD’s efficacy across different neural architectures like Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs). Furthermore, exploring integration mechanisms for real-time adaptability in dynamic workloads could further enhance ARSVD’s applicability. As the field progresses, the approach revealed in this paper provides foundational insights into model compression methodologies while prompting examination of innovative solutions for efficiency in deep learning systems.

Low-Rank Matrix Approximation for Neural Network Compression (2504.20078v2)

Summary