Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeltaKWS: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM (2405.03905v2)

Published 6 May 2024 in cs.AR, cs.CV, cs.SD, and eess.AS

Abstract: This paper introduces DeltaKWS, to the best of our knowledge, the first $\Delta$RNN-enabled fine-grained temporal sparsity-aware KWS IC for voice-controlled devices. The 65 nm prototype chip features a number of techniques to enhance performance, area, and power efficiencies, specifically: 1) a bio-inspired delta-gated recurrent neural network ($\Delta$RNN) classifier leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses; 2) an IIR BPF-based FEx that leverages mixed-precision quantization, low-cost computing structure and channel selection; 3) a 24 kB 0.6 V near-$V_\text{TH}$ weight SRAM that achieves 6.6X lower read power than the foundry-provided SRAM. From chip measurement results, we show that the DeltaKWS achieves an 11/12-class GSCD accuracy of 90.5%/89.5% respectively and energy consumption of 36 nJ/decision in 65 nm CMOS process. At 87% temporal sparsity, computing latency and energy/inference are reduced by 2.4X/3.4X, respectively. The IIR BPF-based FEx, $\Delta$RNN accelerator, and 24 kB near-$V_\text{TH}$ SRAM blocks occupy 0.084 mm${2}$, 0.319 mm${2}$, and 0.381 mm${2}$ respectively (0.78 mm${2}$ in total).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. 14.1 a 510nw 0.41v low-memory low-computation keyword-spotting chip using serial fft-based mfcc and binarized depthwise separable convolutional neural network in 28nm cmos. In 2020 IEEE International Solid-State Circuits Conference - (ISSCC), pages 230–232, 2020.
  2. A 23μ𝜇\muitalic_μw solar-powered keyword-spotting asic with ring-oscillator-based time-domain feature extraction. In 2022 IEEE International Solid-State Circuits Conference (ISSCC), volume 65, pages 1–3, 2022.
  3. A 23-μ𝜇\muitalic_μw keyword spotting ic with ring-oscillator-based time-domain feature extraction. IEEE Journal of Solid-State Circuits, 57(11):3298–3311, 2022.
  4. A 0.44-μ𝜇\muitalic_μj/dec, 39.9-μ𝜇\muitalic_μs/dec, recurrent attention in-memory processor for keyword spotting. IEEE Journal of Solid-State Circuits, 56(7):2234–2244, 2021.
  5. A 1.5⁢μ⁢W1.5𝜇W1.5\mu\mathrm{W}1.5 italic_μ roman_W end-to-end keyword spotting soc with content-adaptive frame sub-sampling and fast-settling analog frontend. In 2023 IEEE International Solid-State Circuits Conference (ISSCC), pages 1–3, 2023.
  6. Delta networks for optimized recurrent network computation. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2584–2593. PMLR, 06–11 Aug 2017.
  7. DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’18, page 21–30, New York, NY, USA, 2018. Association for Computing Machinery.
  8. EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 10(4):419–432, 2020.
  9. Energy-efficient activity-driven computing architectures for edge intelligence. In 2022 International Electron Devices Meeting (IEDM), pages 21.2.1–21.2.4, 2022.
  10. To spike or not to spike: A digital hardware perspective on deep learning acceleration. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 13(4):1015–1025, 2023.
  11. An area-efficient ultra-low-power time-domain feature extractor for edge keyword spotting. In 2023 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5, 2023.
  12. A 32kb 10t subthreshold sram array with bit-interleaving and differential read scheme in 90nm cmos. In 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, pages 388–622, 2008.
  13. A 65nm 8t sub-vt sram employing sense-amplifier redundancy. In 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pages 328–606, 2007.
  14. Low-power near-threshold 10t sram bit cells with enhanced data-independent read port leakage for array augmentation in 32-nm cmos. IEEE Transactions on Circuits and Systems I: Regular Papers, 66(3):978–988, 2019.
  15. Reckon: A 28nm sub-mm2 task-agnostic spiking recurrent neural network processor enabling on-chip learning over second-long timescales. In 2022 IEEE International Solid-State Circuits Conference (ISSCC), volume 65, pages 1–3, 2022.
  16. A 183.4nj/inference 152.8μ𝜇\muitalic_μw single-chip fully synthesizable wired-logic dnn processor for always-on 35 voice commands recognition application. In 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), pages 1–2, 2023.

Summary

We haven't generated a summary for this paper yet.