Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Processing of Deep Neural Networks: A Tutorial and Survey (1703.09039v2)

Published 27 Mar 2017 in cs.CV

Abstract: Deep neural networks (DNNs) are currently widely used for many AI applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

The article "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," authored by Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel Emer, provides a thorough examination of the complexities and strategies involved in the computational optimization of Deep Neural Networks (DNNs). The discussion spans an exploration into the design considerations and advancements in hardware platforms, data flow architectures, and co-design strategies which aim to implement DNNs more efficiently.

Overview of DNN Architectural Strategies

The article begins by emphasizing the broad applicability of DNNs in fields such as computer vision, speech recognition, and robotics. DNNs have achieved notable accuracy in these domains due to their ability to extract hierarchical features from raw input data via extensive computational learning. However, this state-of-the-art performance often incurs significant computational costs, necessitating innovative approaches to maximize efficiency without compromising accuracy.

Hardware Platforms and Architectural Innovations

A significant portion of the survey discusses hardware architectures that facilitate efficient DNN processing. Traditional temporal architectures like CPUs and GPUs employ SIMD and SIMT techniques for parallel MAC operations, typically transforming convolutional computations into matrix multiplications. Software libraries such as Intel MKL and Nvidia cuDNN have optimized these computations via tiling strategies tailored to memory hierarchies.

Spatial architectures, on the other hand, leverage dataflows to enhance local memory reuse. The survey categorizes dataflow types into weight stationary (WS), output stationary (OS), and row stationary (RS) strategies, each focusing on maximizing reuse of specific data elements to minimize energy-intensive DRAM accesses. Notably, the RS dataflow is highlighted for its superior energy efficiency across various neural network layers.

Co-design of DNN Models and Hardware

Emergent research has leaned towards the co-design of DNN algorithms and their hardware implementations to achieve optimal performance and energy savings. The survey elaborates on several key techniques within this domain:

  • Precision Reduction: Techniques such as dynamic fixed-point representation allow DNNs to compute with reduced precision (e.g., 8-bit integers) without significant losses in accuracy. More aggressive approaches such as binary and ternary networks allow for computation using even fewer bits, though often with some compromise in accuracy.
  • Network Pruning: By removing redundant weights (network pruning), the size and computational demand of DNNs can be significantly reduced. The methodology includes fine-tuning the remaining weights to maintain accuracy.
  • Activation Statistics Exploitation: Identifying and leveraging sparsity in activation maps can lead to notable energy savings by reducing the necessity of computations on zero-valued activations.
  • Architectural Optimizations: Advancements in compact network architectures and tensor decomposition help reduce the number of operations and model size without degrading performance.

Near-Data Processing Techniques

The integration of compute capabilities closer to data storage or collection points is reviewed as a method to reduce data transfer costs. This includes leveraging mixed-signal circuits and advanced memory technologies such as:

  • 3-D Stacked Memory: Technologies like HBM and HMC offer higher bandwidth and lower energy costs by integrating memory closer to processing units.
  • In-Memory Computing: Utilizing non-volatile resistive memories, like memristors, for direct analog computation.
  • Sensor-Integrated Computation: Embedding computation within sensors to preprocess data locally, thereby reducing the data that needs to be transmitted to central processing units.

Future Implications and Benchmarking Challenges

While remarkable progress has been made, the paper acknowledges ongoing challenges and future directions in optimizing DNN computations. Developing standard benchmarking metrics that comprehensively evaluate accuracy, energy efficiency, throughput, and hardware cost is critical for advancing the state of the art. This would enable a fair comparison across different designs, revealing the best strategies for specific application domains.

Conclusion

The survey by Sze and colleagues underscores the importance of interdisciplinary efforts in advancing DNN efficiency. By bridging the gap between algorithmic innovations and hardware capabilities, the field can continue to extend the reach of DNN applications, ensuring they run effectively within the constraints of real-world environments. The ongoing advancements in this space promise to make DNNs even more ubiquitous and integral to a myriad of AI-driven applications in the future.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Vivienne Sze (34 papers)
  2. Yu-Hsin Chen (18 papers)
  3. Tien-Ju Yang (16 papers)
  4. Joel Emer (8 papers)
Citations (2,833)
Youtube Logo Streamline Icon: https://streamlinehq.com