CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices (1708.08917v1)

Published 29 Aug 2017 in cs.CV, cs.AI, cs.LG, and stat.ML

Abstract: Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning; 2) the increased training complexity; and 3) the lack of rigorous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(nlogn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: it can converge to the same effectiveness as DNNs without compression. The CirCNN architecture, a universal DNN inference engine that can be implemented on various hardware/software platforms with configurable network architecture. To demonstrate the performance and energy efficiency, we test CirCNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6-102X energy efficiency improvements compared with the best state-of-the-art results.

Citations (252)

View on Semantic Scholar

Summary

The paper introduces CirCNN, which compresses DNNs using block-circulant matrices to reduce computational complexity from O(n²) to O(n log n) with minimal accuracy loss.
It leverages FFT-based fast multiplication and optimized pipelining to facilitate efficient training and inference across various hardware platforms.
CirCNN demonstrates up to 102x energy efficiency improvements, making it ideal for resource-constrained applications like mobile devices and IoT systems.

Analyzing CirCNN: Block-Circulant Matrices for Neural Network Efficiency

The paper "CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices" introduces CirCNN, an innovative method to enhance the computational efficiency and compress the storage of deep neural networks (DNNs) using block-circulant matrices. The key premise of this research is addressing the considerable computational and memory demands associated with large-scale DNNs, aiming to maintain their impressive accuracy while improving scalability, performance, and energy efficiency.

Approach and Methodology

The CirCNN framework leverages the structured representation of weights through block-circulant matrices. This transforms both the computational and storage complexities from O( $n^2$ ) to O( $n\log n$ ) and O( $n$ ) respectively. One of the most notable elements is the use of Fast Fourier Transform (FFT)-based fast multiplication to drastically reduce the operations required in both inference and training phases, and to keep the DNN performance stable with negligible accuracy loss.

The paper highlights the disadvantages of traditional weight pruning, such as irregular network structures and increased training complexities, and positions block-circulant matrices as a solution. These matrices not only ensure a regular network structure, facilitating ease of implementation and higher throughput, but also adhere to rigorous mathematical foundations, guaranteeing the model's ability to converge to similar effectiveness compared to its non-compressed counterparts.

Architectural Overview

CirCNN proposes a universal DNN inference architecture adaptable across various platforms, including FPGA, ASIC, and embedded systems. It cleverly exploits the recursive nature of FFT to devise a small-footprint architecture, focusing on three primary optimizations:

FFT/IFFT Computing Block: As the core computational unit, it capitalizes on FFT's recursive properties for efficient size- $n$ transformations that support diverse network layers, configurations, and learning models.
Pipelining and Parallelism: The architecture is designed to optimize inter-and intra-level pipelining, enhancing performance with strategic trade-offs between resource utilization and energy consumption.
Platform-Specific Optimization: Tailored to the unique memory and computational constraints of each platform, guaranteeing high energy efficiency without compromising on performance.

Evaluation and Implications

The evaluation demonstrates significant improvements in energy efficiency and performance across tested platforms, achieving up to 102x energy efficiency improvement over existing methods. This leap underscores CirCNN's potential in not just accelerating inference during deployment but also facilitating rapid training cycles, broadening its utility across numerous applications, including mobile devices and IoT systems.

The paper also speculates on implications such as the potential for enhanced real-time DNN applications due to increased energy efficiency and deployed models' practicality on smaller, lower-power hardware. The methodology might spearhead future developments in compact, high-performance neural network processors, essential for resource-constrained environments.

Conclusion

The innovative integration of block-circulant matrices into DNN frameworks, as presented by CirCNN, advances the field significantly by reconciling the challenge of efficiency with uninterrupted performance levels. This approach can bring about fundamental changes in the real-world application of deep learning, particularly in scenarios where hardware resources are at a premium, paving the way for further research into structured matrix applications in neural networks. The paper exemplifies a robust blend of theoretical soundness and practical applicability, offering a new paradigm for DNN acceleration and compression.

PDF Markdown