Tensorizing Neural Networks (1509.06569v2)

Published 22 Sep 2015 in cs.LG and cs.NE

Abstract: Deep neural networks currently demonstrate state-of-the-art performance in several domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times.

Authors (4)

Alexander Novikov (30 papers)
Dmitry Podoprikhin (3 papers)
Anton Osokin (19 papers)
Dmitry Vetrov (84 papers)

Citations (844)

View on Semantic Scholar

Summary

The paper introduces a TT-layer that uses tensor train decomposition to compress fully-connected neural network layers, reducing parameters by up to 200,000 times.
It demonstrates that networks with TT-layers, like TensorNet, match the performance of uncompressed models on benchmarks such as MNIST, CIFAR-10, and ImageNet.
The approach enables scalable and resource-efficient neural networks, paving the way for deployment on devices with limited hardware capabilities.

Tensorizing Neural Networks: An Expert Overview

The paper "Tensorizing Neural Networks" by Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, and Dmitry Vetrov addresses a critical limitation of deep neural networks: their substantial computational and memory demands. The authors propose an innovative approach to mitigate these issues by leveraging the Tensor Train (TT) decomposition format to compress the weight matrices of fully-connected layers, while preserving their expressive power.

Key Contributions

The authors' primary contribution is the introduction of the TT-layer, a fully-connected layer where the weight matrix is stored in the TT-format. This compression approach significantly reduces the number of parameters without sacrificing predictive accuracy. Here are the core elements of this work:

TT-format Application: The TT-format is applied to the dense weight matrices, resulting in a reduction of parameters by up to 200,000 times for a single fully-connected layer. This leads to an overall network compression factor of up to 7.
Compatibility with Existing Algorithms: The TT-layer is designed to be compatible with existing training algorithms for neural networks, including back-propagation, by deriving necessary equations for gradient computation under the TT-format.
Experimental Validation: Comprehensive experiments demonstrate that the TensorNet, a network that incorporates one or more TT-layers, matches the performance of its uncompressed counterparts on multiple datasets such as MNIST, CIFAR-10, and ImageNet, while vastly reducing parameter space and computational requirements.

Numerical Results

The authors report strong numerical results across various benchmarks. For instance, they achieve a remarkable compression rate of 194,622 times for the weight matrix in the largest fully-connected layer of the vgg-19 network, reducing it from 25088×4096 parameters to just 528. On the CIFAR-10 dataset, the TensorNet with TT-layers showed robustness in performance, achieving a test error of 24.39% while reducing the parameter count of fully-connected layers by a factor of 11.9.

Theoretical and Practical Implications

Theoretical Insights

Exploration of Redundancy: The paper reaffirms the high redundancy present in conventional neural network parameters and demonstrates that a TT-layer can significantly compress these parameters while maintaining the model's expressive capability.
Scalable Architectures: With the TT-format, it becomes feasible to construct much wider layers than previously possible, potentially enhancing the model's expressive power without a proportional increase in parameters.

Practical Implications

Resource Efficiency: The reduction in parameters translates to lower memory usage, making it feasible to deploy neural networks on devices with limited hardware capabilities, such as mobile devices.
Faster Inference: TT-layers offer benefits in terms of faster inference times compared to traditional fully-connected layers, enhancing the potential for real-time applications.

Future Directions

The paper opens several avenues for future research and development. One immediate direction is to extend the TT-format to input and output layers, thereby completely eliminating dependencies on conventional layer dimensions. This could allow for the development of neural networks with billions of hidden units, offering unprecedented levels of model complexity and capability.

Moreover, exploring hybrid architectures that combine TT-layers with other forms of neural network compression techniques, such as low-rank approximations and quantization, could yield further efficiencies. Investigating the potential of TT-layers in different contexts, such as recurrent neural networks or transformers, could also prove beneficial.

Conclusion

The authors present a compelling case for tensorizing fully-connected layers within neural networks using the Tensor Train format. The TT-layer offers a promising solution to the computational and memory challenges that currently limit the scalability and deployment of deep learning models. This work not only provides a significant step forward in model compression but also paves the way for future innovations in the design and implementation of neural networks.

PDF Markdown