"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach (2403.00258v1)

Published 1 Mar 2024 in stat.ML and cs.LG

Abstract: Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, so that they can be deployed on low-power IoT devices. In this paper, building upon recent advances in neural tangent kernel (NTK) and random matrix theory (RMT), we provide a novel compression approach to wide and fully-connected \emph{deep} neural nets. Specifically, we demonstrate that in the high-dimensional regime where the number of data points $n$ and their dimension $p$ are both large, and under a Gaussian mixture model for the data, there exists \emph{asymptotic spectral equivalence} between the NTK matrices for a large family of DNN models. This theoretical result enables "lossless" compression of a given DNN to be performed, in the sense that the compressed network yields asymptotically the same NTK as the original (dense and unquantized) network, with its weights and activations taking values \emph{only} in ${ 0, \pm 1 }$ up to a scaling. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme, with code available at \url{https://github.com/Model-Compression/Lossless_Compression}.

PDF Abstract

Compression of Deep Neural Networks Using High-dimensional Statistical Approaches

Deep Neural Networks (DNNs) have experienced an exponential growth in size and complexity, concomitantly with their computational and storage demands. This surge has sparked a keen interest in developing efficient compression techniques to enable deployment in resource-constrained environments. Rooted in the latest advances in neural tangent kernels (NTK) and high-dimensional statistical theory, the paper “Lossless” Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach by Lingyu Gu et al. introduces a novel approach to compressing DNNs that stands out for maintaining the original network's performance characteristics remarkably well.

Theoretical Foundation and Results

The paper leverages the asymptotic spectral equivalence between the Conjugate Kernel (CK) and NTK matrices of fully-connected deep neural networks, grounded on the premise that both data dimensionality and sample size are large. This spectral equivalence underscores the potential for compressing DNNs without significant loss in their performance, as it implies that a sparse and quantized network can exhibit the same NTK eigenspectrum as its dense counterpart.

The authors delineate two main theorems supported by rigorous mathematical exposition. Theorem 1 establishes the asymptotic spectral equivalence for CK matrices, demonstrating that the spectral behavior of these matrices is independent of the (random) weight distributions, provided they possess zero mean and unit variance. It reveals that the spectral characteristics are influenced solely by a few scalar parameters related to the activation functions. Theorem 2 extends this analysis to NTK matrices, presenting an analogous spectral equivalence and providing a cornerstone for the proposed DNN compression approach.

Compression Scheme

Building upon the theoretical findings, the paper proposes a compression scheme that achieves what is referred to as “lossless” compression. This method involves designing a network with sparsified and ternarized weights and activations, ensuring that the compressed network retains the original NTK eigenspectrum. The empirical validation of this scheme on various datasets, including not only synthetic but also real-world datasets such as MNIST and CIFAR10, highlights its effectiveness. Notably, the experiments report significant savings in both computational and storage resources (up to a factor of 10 in memory) with virtually no detrimental impact on performance.

Practical Implications and Future Directions

The implications of this research are manifold. For practitioners, especially those working within the constraints of low-power and low-memory devices, the strategy presents a viable path to deploying advanced deep learning models without the traditional trade-offs in accuracy. Theoretically, the paper enriches the understanding of the underlying mechanisms that govern the behavior of DNNs in high-dimensional settings, which could pave the way for further innovations in model efficiency.

Looking towards the future, extending this framework to encompass convolutional and more complex neural network architectures emerges as a natural progression. Additionally, examining the convergence and generalization properties of networks compressed using this scheme under finite-width settings would add valuable insights, potentially broadening the applicability of these techniques.

In summary, the “Lossless” Compression of Deep Neural Networks paper contributes a significant step forward in DNN model compression, balancing rigorously derived theoretical results with practical applicability. The proposed compression scheme not only holds promise for immediate implementation in resource-constrained environments but also provides an intriguing foundation for future explorations in the field of efficient AI.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Lingyu Gu (2 papers)
Yongqi Du (1 paper)
Yuan Zhang (331 papers)
Di Xie (57 papers)
Shiliang Pu (106 papers)
Robert C. Qiu (49 papers)
Zhenyu Liao (38 papers)

Citations (7)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/knishimae0531/status/1764858100685976049