Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets (2207.06920v2)

Published 13 Jul 2022 in cs.SD, cs.LG, and eess.AS

Abstract: We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the 1st-stage, we adapt a recently proposed quantization technique using a non-linear transformation with tanh(.) on dense layer weights. In the 2nd-stage, we use linear quantization methods on the rest of the network, including other parameters (bias, gain, batchnorm), inputs, and activations. We conduct large scale experiments, training on 26,000 hours of de-identified production, far-field and near-field audio data (evaluating on 4,000 hours of data). We organize our results in two embedded chipset settings: a) with commodity ARM NEON instruction set and 8-bit containers, we present accuracy, CPU, and memory results using sub 8-bit weights (4, 5, 8-bit) and 8-bit quantization of rest of the network; b) with off-the-shelf neural network accelerators, for a range of weight bit widths (1 and 5-bit), while presenting accuracy results, we project reduction in memory utilization. In both configurations, our results show that the proposed algorithm can achieve: a) parity with a full floating point model's operating point on a detection error tradeoff (DET) curve in terms of false detection rate (FDR) at false rejection rate (FRR); b) significant reduction in compute and memory, yielding up to 3 times improvement in CPU consumption and more than 4 times improvement in memory consumption.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Lu Zeng (8 papers)
  2. Sree Hari Krishnan Parthasarathi (9 papers)
  3. Yuzong Liu (12 papers)
  4. Alex Escott (3 papers)
  5. Santosh Kumar Cheekatmalla (4 papers)
  6. Nikko Strom (10 papers)
  7. Shiv Vitaladevuni (7 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.