Compressing Neural Networks with the Hashing Trick (1504.04788v1)

Published 19 Apr 2015 in cs.LG and cs.NE

Abstract: As deep nets are increasingly used in applications suited for mobile devices, a fundamental dilemma becomes apparent: the trend in deep learning is to grow models to absorb ever-increasing data set sizes; however mobile devices are designed with very little memory and cannot store such large models. We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. HashedNets uses a low-cost hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value. These parameters are tuned to adjust to the HashedNets weight sharing architecture with standard backprop during training. Our hashing procedure introduces no additional memory overhead, and we demonstrate on several benchmark data sets that HashedNets shrink the storage requirements of neural networks substantially while mostly preserving generalization performance.

Citations (1,171)

View on Semantic Scholar

Summary

The paper introduces HashedNets to reduce memory usage by applying random weight sharing through feature hashing, achieving significant compression with minimal loss in accuracy.
Experimental results on benchmarks like MNIST show that HashedNets deliver up to a 1/64 compression factor while outperforming traditional methods.
The paper highlights the potential for deploying deep learning models on memory-constrained devices, paving the way for efficient mobile and embedded applications.

Compressing Neural Networks with the Hashing Trick

Neural networks have achieved impressive results across various fields such as image classification, speech recognition, and autonomous driving. However, the increasing size and complexity of these networks present challenges, especially when deploying models on devices with limited memory and computational resources. The paper "Compressing Neural Networks with the Hashing Trick" by Wenlin Chen et al. addresses this issue by proposing a novel approach called HashedNets.

Overview

The authors introduce HashedNets as a method for significantly reducing the memory footprint of neural networks without a substantial loss in accuracy. The approach leverages feature hashing, a technique commonly used in handling high-dimensional sparse data, to achieve random weight sharing within the neural network. This structure allows networks to maintain a large number of "virtual" weights that do not require dedicated storage, thus fitting into the constrained memory spaces of mobile and embedded devices.

Methodology

Random Weight Sharing

The authors propose reducing the number of unique weights in a neural network by employing random weight sharing. Instead of each connection between layers having a distinct weight, HashedNets use a hash function to assign multiple connections to the same weight value. This results in a smaller set of weights stored, while the actual network perceived during computation retains a much larger number of parameters.

The hash function, implemented using the xxHash algorithm for efficiency, maps the indices of the weight matrix into a smaller set of weights stored in memory. During both training and inference, backpropagation and forward computation are adapted to account for this weight sharing scheme.

Feature Hashing Equivalence

The paper formalizes the relationship between random weight sharing and feature hashing. Essentially, the hashed neural network can be seen as applying the hashing trick to the activations, allowing weight sharing across the network.

Key to this method is the use of a sign hash function, which ensures unbiased inner product operations despite hash collisions. This guarantees that while weights are shared, the overall behavior of the network remains consistent with a non-hashed version, subject to approximation.

Experimental Results

Extensive experiments on eight benchmark datasets, including MNIST and its variants as well as other binary image classification datasets like CONVEX and RECT, demonstrate the efficacy of HashedNets. Key findings include:

Compression Factors: HashedNets maintain competitive accuracy even with significant compression factors (up to $\frac{1}{64}$ ), outperforming other compression methods such as random edge removal and low-rank decompositions.
Comparative Analysis: When comparing against standard neural networks of equivalent storage size, HashedNets provide superior performance, particularly noticeable in scenarios where small model sizes are critical.
Expansion Effect: With a fixed storage budget, expanding the virtual network architecture (i.e., increasing the number of hidden units) while using HashedNets demonstrates improved performance, suggesting an increase in the effective expressive power of the network.

Implications and Future Directions

The implications of HashedNets are significant for the deployment of deep learning models on resource-constrained devices. The approach allows for leveraging larger models' capabilities while adhering to strict memory limitations. This is particularly relevant for mobile and embedded applications, such as real-time speech recognition or onboard vehicle perception systems, where processing power and memory are often bottlenecks.

Future research could explore optimizing HashedNets for execution on GPU architectures. This involves addressing issues related to non-coalesced memory access, which are critical for maximizing computational efficiency on GPUs. Additionally, combining HashedNets with other compression techniques—such as reduced numerical precision—promises further advancements in model compression, potentially enabling even more substantial reductions in memory consumption.

In conclusion, the paper’s contribution, HashedNets, effectively addresses the pressing need for memory-efficient neural network architectures. By utilizing random weight sharing with feature hashing, it achieves substantial compression while maintaining the network's functional integrity and accuracy, paving the way for more accessible and scalable deployment of deep learning models in real-world applications.