Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient yet Accurate End-to-End SC Accelerator Design (2401.15332v1)

Published 27 Jan 2024 in cs.AR

Abstract: Providing end-to-end stochastic computing (SC) neural network acceleration for state-of-the-art (SOTA) models has become an increasingly challenging task, requiring the pursuit of accuracy while maintaining efficiency. It also necessitates flexible support for different types and sizes of operations in models by end-to-end SC circuits. In this paper, we summarize our recent research on end-to-end SC neural network acceleration. We introduce an accurate end-to-end SC accelerator based on a deterministic coding and sorting network. In addition, we propose an SC-friendly model that combines low-precision data paths with high-precision residuals. We introduce approximate computing techniques to optimize SC nonlinear adders and provide some new SC designs for arithmetic operations required by SOTA models. Overall, our approach allows for further significant improvements in circuit efficiency, flexibility, and compatibility through circuit design and model co-optimization. The results demonstrate that the proposed end-to-end SC architecture achieves accurate and efficient neural network acceleration while flexibly accommodating model requirements, showcasing the potential of SC in neural network acceleration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. W. Romaszkan et al., “ACOUSTIC: Accelerating Convolutional Neural Networks through Or-Unipolar Skipped Stochastic Computing,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2020, pp. 768–773.
  2. W. Romaszkan et al., “A 4.4–75-TOPS/W 14-nm Programmable, Performance- and Precision-Tunable All-Digital Stochastic Computing Neural Network Inference Accelerator,” IEEE Solid-State Circuits Letters, vol. 5, pp. 206–209, 2022.
  3. Y. Zhang et al., “When sorting network meets parallel bitstreams: A fault-tolerant parallel ternary neural network accelerator based on stochastic computing,” in Design, Automation & Test in Europe Conference & Exhibition (DATE).   IEEE, 2020, pp. 1287–1290.
  4. Y. Hu et al., “A 28-nm 198.9-TOPS/W Fault-Tolerant Stochastic Computing Neural Network Processor,” IEEE Solid-State Circuits Letters, vol. 5, pp. 198–201, 2022.
  5. Y. Zhang et al., “Accurate and Energy-Efficient Implementation of Non-Linear Adder in Parallel Stochastic Computing using Sorting Network,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2020, pp. 1–5.
  6. K. Kim et al., “Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks,” in Proceedings of the 53rd Annual Design Automation Conference, 2016, pp. 1–6.
  7. J. Li et al., “Towards acceleration of deep convolutional neural networks using stochastic computing,” in 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).   IEEE, 2017, pp. 115–120.
  8. Z. Li et al., “HEIF: Highly efficient stochastic computing-based inference framework for deep neural networks,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 8, pp. 1543–1556, 2018.
  9. J. Li et al., “Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks,” in 2017 International Joint Conference on Neural Networks (IJCNN), 2017, pp. 1230–1236.
  10. Y. Hu et al., “Accurate yet Efficient Stochastic Computing Neural Acceleration with High Precision Residual Fusion,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2023.
  11. Y. Hu et al., “Efficient Non-Linear Adder for Stochastic Computing with Approximate Spatial-Temporal Sorting Network,” in ACM/IEEE Design Automation Conference (DAC), 2023.
  12. Y. Hu et al., “ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer,” in submitted.
  13. K. E. Batcher, “Sorting networks and their applications,” in Proceedings of the April 30–May 2, 1968, spring joint computer conference, 1968, pp. 307–314.
  14. S. Mohajer et al., “Routing magic: Performing computations using routing networks and voting logic on unary encoded data,” in Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018, pp. 77–86.
  15. J. Lee et al., “UNPU: A 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision,” in 2018 IEEE International Solid-State Circuits Conference-(ISSCC).   IEEE, 2018, pp. 218–220.
  16. J. Song et al., “7.1 An 11.5 TOPS/W 1024-MAC butterfly structure dual-core sparsity-aware neural processing unit in 8nm flagship mobile SoC,” in 2019 IEEE International Solid-State Circuits Conference-(ISSCC).   IEEE, 2019, pp. 130–132.
  17. C.-H. Lin et al., “7.1 A 3.4-to-13.3 TOPS/W 3.6 TOPS dual-core deep-learning accelerator for versatile AI applications in 7nm 5G smartphone SoC,” in 2020 ieee international solid-state circuits conference-(isscc).   IEEE, 2020, pp. 134–136.
  18. F. Tu et al., “Evolver: A deep learning processor with on-device quantization–voltage–frequency tuning,” IEEE Journal of Solid-State Circuits, vol. 56, no. 2, pp. 658–673, 2020.
  19. H. Mo et al., “9.2 A 28nm 12.1 TOPS/W dual-mode CNN processor using effective-weight-based convolution and error-compensation-based prediction,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64.   IEEE, 2021, pp. 146–148.

Summary

We haven't generated a summary for this paper yet.