Papers
Topics
Authors
Recent
Search
2000 character limit reached

SF-MMCN: Low-Power Sever Flow Multi-Mode Diffusion Model Accelerator

Published 8 Mar 2024 in cs.AR and cs.CV | (2403.10542v2)

Abstract: Generative AI has become incredibly popular in recent years, and the significance of traditional accelerators in dealing with large-scale parameters is urgent. With the diffusion model's parallel structure, the hardware design challenge has skyrocketed because of the multiple layers operating simultaneously. Convolution Neural Network (CNN) accelerators have been designed and developed rapidly, especially for high-speed inference. Often, CNN models with parallel structures are deployed. In these CNN accelerators, many Processing Elements (PE) are required to perform parallel computations, mainly the multiply and accumulation (MAC) operation, resulting in high power consumption and a large silicon area. In this work, a Server Flow Multi-Mode CNN Unit (SF-MMCN) is proposed to reduce the number of PE while improving the operation efficiency of the CNN accelerator. The pipelining technique is introduced into Server Flow to process parallel computations. The proposed SF-MMCN is implemented with TSMC 90-nm CMOS technology. It is evaluated with VGG-16, ResNet-18, and U-net. The evaluation results show that the proposed SF-MMCN can reduce the power consumption by 92%, and the silicon area by 70%, while improving the efficiency of operation by nearly 81 times. A new FoM, area efficiency (GOPs/mm2) is also introduced to evaluate the performance of the accelerator in terms of the ratio throughput (GOPs) and silicon area (mm2). In this FoM, SF-MMCN improves area efficiency by 18 times (18.42).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Carry based approximate full adder for low power approximate computing. In 2019 7th International Conference on Smart Computing & Communications (ICSCC), pages 1–4. IEEE, 2019.
  2. Bio-inspired imprecise computational blocks for efficient vlsi implementation of soft-computing applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 57(4):850–862, 2009.
  3. Systematic design of an approximate adder: The optimized lower part constant-or adder. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26(8):1595–1599, 2018.
  4. S Raghuram and N Shashank. Approximate adders for deep neural network accelerators. In 2022 35th International Conference on VLSI Design and 2022 21st International Conference on Embedded Systems (VLSID), pages 210–215. IEEE, 2022.
  5. Power-area efficient computing technique for approximate multiplier with carry prediction. In 2023 11th International Conference on Emerging Trends in Engineering & Technology-Signal and Information Processing (ICETET-SIP), pages 1–4. IEEE, 2023.
  6. Drum: A dynamic range unbiased multiplier for approximate applications. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 418–425. IEEE, 2015.
  7. Cnn inference using a preprocessing precision controller and approximate multipliers with various precisions. IEEE Access, 9:7220–7232, 2021.
  8. A precision-scalable energy-efficient convolutional neural network accelerator. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(10):3484–3497, 2020a.
  9. Low power 3–2 and 4–2 adder compressors implemented using astran. In 2012 IEEE 3rd Latin American Symposium on Circuits and Systems (LASCAS), pages 1–4. IEEE, 2012.
  10. Two efficient approximate unsigned multipliers by developing new configuration for approximate 4: 2 compressors. IEEE Transactions on Circuits and Systems I: Regular Papers, 70(4):1649–1659, 2023.
  11. Low-power approximate multiplier with error recovery using a new approximate 4-2 compressor. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–4. IEEE, 2020.
  12. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits, 52(1):127–138, 2016.
  13. An efficient reconfigurable hardware accelerator for convolutional neural networks. In 2017 51st Asilomar Conference on Signals, Systems, and Computers, pages 1337–1341. IEEE, 2017.
  14. An energy-efficient and flexible accelerator based on reconfigurable computing for multiple deep convolutional neural networks. In 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), pages 1–3. IEEE, 2018.
  15. Carla: A convolution accelerator with a reconfigurable and low-energy architecture. IEEE Transactions on Circuits and Systems I: Regular Papers, 68(8):3184–3196, 2021.
  16. Reconfigurable and low-complexity accelerator for convolutional and generative networks over finite fields. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(12):4894–4907, 2020a.
  17. A precision-scalable energy-efficient convolutional neural network accelerator. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(10):3484–3497, 2020b.
  18. A low-cost and configurable hardware architecture of sparse 1-d cnn for ecg classification. In 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), pages 1–3. IEEE, 2022.
  19. 9.2 a 28nm 12.1 tops/w dual-mode cnn processor using effective-weight-based convolution and error-compensation-based prediction. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), volume 64, pages 146–148. IEEE, 2021.
  20. Accelerating convolutional neural network inference based on a reconfigurable sliced systolic array. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE, 2021.
  21. Reconfigurable and low-complexity accelerator for convolutional and generative networks over finite fields. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(12):4894–4907, 2020b.
  22. A energy-efficient re-configurable multi-mode convolution neuron network accelerator. In 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pages 45–50. IEEE, 2023.
  23. Fused-layer cnn accelerators. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1–12. IEEE, 2016.
  24. A multi-task hardwired accelerator for face detection and alignment. IEEE Transactions on Circuits and Systems for Video Technology, 30(11):4284–4298, 2019.
  25. Ieca: An in-execution configuration cnn accelerator with 30.55 gops/mm22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT area efficiency. IEEE Transactions on Circuits and Systems I: Regular Papers, 68(11):4672–4685, 2021.
  26. Memory-efficient cnn accelerator based on interlayer feature map compression. IEEE Transactions on Circuits and Systems I: Regular Papers, 69(2):668–681, 2021.
  27. A 127.8 tops/w arbitrarily quantized 1-to-8b scalable-precision accelerator for general-purpose deep learning with reduction of storage, logic and latency waste. In 2023 IEEE International Solid-State Circuits Conference (ISSCC), pages 21–23. IEEE, 2023.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.