Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing (1611.05939v2)

Published 18 Nov 2016 in cs.CV

Abstract: With recent advancing of Internet of Things (IoTs), it becomes very attractive to implement the deep convolutional neural networks (DCNNs) onto embedded/portable systems. Presently, executing the software-based DCNNs requires high-performance server clusters in practice, restricting their widespread deployment on the mobile devices. To overcome this issue, considerable research efforts have been conducted in the context of developing highly-parallel and specific DCNN hardware, utilizing GPGPUs, FPGAs, and ASICs. Stochastic Computing (SC), which uses bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has a high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power/energy and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources bring about immense design space for enhancing scalability and robustness for hardware DCNNs. This paper presents the first comprehensive design and optimization framework of SC-based DCNNs (SC-DCNNs). We first present the optimal designs of function blocks that perform the basic operations, i.e., inner product, pooling, and activation function. Then we propose the optimal design of four types of combinations of basic function blocks, named feature extraction blocks, which are in charge of extracting features from input feature maps. Besides, weight storage methods are investigated to reduce the area and power/energy consumption for storing weights. Finally, the whole SC-DCNN implementation is optimized, with feature extraction blocks carefully selected, to minimize area and power/energy consumption while maintaining a high network accuracy level.

Citations (189)

Summary

  • The paper demonstrates SC-DCNN significantly reduces hardware footprint and energy usage while maintaining high network accuracy.
  • It outlines a detailed design and joint optimization of core functional blocks such as inner product, pooling, and activation functions.
  • Experimental evaluations reveal a throughput of 781250 images per second and an energy efficiency of 510734 images per joule with less than 1.5% inaccuracy.

SC-DCNN: A Framework for SC-based Deep Convolutional Neural Networks

The paper "SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing" provides a meticulous exploration into the application of Stochastic Computing (SC) as a promising approach to implement Deep Convolutional Neural Networks (DCNNs) with reduced hardware resources and energy consumption. The authors present a comprehensive design and optimization strategy aimed at deploying DCNNs on embedded and mobile IoT devices.

SC employs a bit-stream to represent numerical values through the probability of one occurring in the stream, offering the potential to perform complex operations like additions and multiplications with minimal hardware using simple logic gates such as AND and multiplexers. The authors introduce SC-DCNN, a framework that systematically integrates SC with DCNNs to exploit remarkable reductions in hardware footprint, energy consumption, and overall scalability while maintaining comparable network accuracy.

The SC-DCNN framework is ingeniously structured in a bottom-up manner. Initially, the paper describes the development and optimization of basic functional blocks indispensable for DCNN architectures. These blocks include essential operations like inner product computations, pooling mechanisms, and activation functions. The framework proposes an innovative design for these blocks, emphasizing the potential reductions in power and hardware while maintaining performance through meticulous joint optimizations.

One significant feature of the framework is the introduction of four distinct feature extraction block designs. These designs are optimized combinations of basic function blocks, adapted for different network configurations to ensure high accuracy. The feature extraction blocks, which act as the network's backbone by connecting various function blocks, are analyzed and optimized to balance trade-offs between performance metrics like hardware cost, latency, power, and accuracy.

Moreover, weight storage is addressed with a focus on effective memory utilization. The authors propose several schemes to optimize the deployment of weights in SRAM, reducing area and power consumption without compromising network accuracy. Efficient filter-aware SRAM sharing, precise weight storage methods, and layer-wise weight precision are some techniques that contribute to a dramatic reduction in resource usage.

Experimental evaluations substantiate the framework’s efficacy in deploying LeNet5, a benchmark DCNN, using SC-DCNN. The evaluations reveal that SC-DCNN achieves a notable throughput of 781250 images per second with an energy efficiency of 510734 images per joule — striking improvements over existing general-purpose and application-specific architectures. The experiments also demonstrate that SC-DCNN can deliver these performance gains while maintaining a network inaccuracy of less than 1.5% compared to traditional software model implementations.

Conclusively, this paper sets a foundation for employing stochastic computing in the field of AI, particularly for edge devices requiring efficient DCNN implementations. It delineates that SC-DCNN is particularly potent for scenarios involving stringent resource constraints. Future developments in stochastic computing and its integration with machine learning models could potentially lead to significant advancements in AI, especially for IoT-based applications where power efficiency and hardware size are crucial considerations. The SC-DCNN framework offers a compelling alternative to current DCNN implementations, encouraging further research and development in the field of efficient neural network design.