UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition (1804.06508v1)

Published 18 Apr 2018 in cs.NE and cs.LG

Abstract: Convolutional Neural Networks (CNNs) have begun to permeate all corners of electronic society (from voice recognition to scene generation) due to their high accuracy and machine efficiency per operation. At their core, CNN computations are made up of multi-dimensional dot products between weight and input vectors. This paper studies how weight repetition ---when the same weight occurs multiple times in or across weight vectors--- can be exploited to save energy and improve performance during CNN inference. This generalizes a popular line of work to improve efficiency from CNN weight sparsity, as reducing computation due to repeated zero weights is a special case of reducing computation due to repeated weights. To exploit weight repetition, this paper proposes a new CNN accelerator called the Unique Weight CNN Accelerator (UCNN). UCNN uses weight repetition to reuse CNN sub-computations (e.g., dot products) and to reduce CNN model size when stored in off-chip DRAM ---both of which save energy. UCNN further improves performance by exploiting sparsity in weights. We evaluate UCNN with an accelerator-level cycle and energy model and with an RTL implementation of the UCNN processing element. On three contemporary CNNs, UCNN improves throughput-normalized energy consumption by 1.2x - 4x, relative to a similarly provisioned baseline accelerator that uses Eyeriss-style sparsity optimizations. At the same time, the UCNN processing element adds only 17-24% area overhead relative to the same baseline.

View on arXiv

Authors (6)

Kartik Hegde (4 papers)
Jiyong Yu (2 papers)
Rohit Agrawal (17 papers)
Mengjia Yan (8 papers)
Michael Pellauer (16 papers)
Christopher W. Fletcher (13 papers)

Citations (162)

View on Semantic Scholar

Summary

Summary of the UCNN Paper on Computational Reuse in DNNs

The paper introduces the Unique Weight CNN Accelerator (UCNN), a novel approach to enhance the efficiency of Convolutional Neural Networks (CNNs) during inference by leveraging weight repetition. The authors focus on reducing computation and energy consumption by exploiting repeated weights—a concept overlooked in the field of deep neural network accelerators.

UCNN tackles the computationally intensive nature of CNNs by proposing architectural innovations that generalize the benefits of sparsity, traditionally only applied to zero weights. By recognizing repetition in non-zero weights, UCNN introduces dot product factorization and activation group reuse to reduce the number of multiplications and memory accesses required during inference.

The technique of dot product factorization involves grouping input activations that share identical weights, thus minimizing the number of multiplications necessary by allowing these grouped activations to be summed first and then multiplied by the repeated weight. This architectural change requires the use of specialized indirection tables that organize weights and inputs efficiently, allowing UCNN to skip operations involving repeated null weights seamlessly.

Activation group reuse further extends this concept across multiple filters, allowing a shared indirection table to provide computation savings by identifying overlapping activation groups. This enables the reuse of pre-computed partial sums across multiple filters, thus minimizing redundant calculations and effectively reducing the overall energy footprint.

The evaluation of UCNN showed promising results. Tested against state-of-the-art CNN models like AlexNet and ResNet-50, UCNN demonstrated improved energy efficiency, with reductions up to four times compared to baseline accelerators that employ traditional sparsity techniques. Interestingly, UCNN proved advantageous even when the networks were heavily quantized, showcasing its ability to seamlessly integrate with existing compression techniques.

From a practical perspective, UCNN is compatible with a wide array of weight quantization levels, offering flexibility to accelerators targeting CNNs trained with differing precision settings. This interoperability is crucial for adapting to the diverse needs of edge devices and various application scenarios requiring CNN deployment.

The theoretical implications are significant as well. UCNN challenges the conventional focus on exploiting zero-weight sparsity by broadening the scope of optimization to repeated non-zero weights. By doing so, it lays a foundation for future research into architectural designs that can capitalize on other intrinsic properties of neural networks beyond traditional sparsity.

Looking ahead, UCNN represents a step towards more generalized methods of computation reuse in neural network architectures. The potential for combining the benefits of weight repetition with established algorithms such as Winograd convolution opens a new avenue for enhancing the efficiency of not only CNNs but also other neural network structures reliant on dot product computations.

In conclusion, UCNN brings forth a compelling architectural paradigm that generalizes sparsity and exploits weight repetition, promising enhancements in efficiency that could pave the way for more energy-efficient and performant neural network applications in both data center and edge environments.

Related Papers

Find Related Papers