Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Shrinking Bigfoot: Reducing wav2vec 2.0 footprint (2103.15760v2)

Published 29 Mar 2021 in cs.CL

Abstract: Wav2vec 2.0 is a state-of-the-art speech recognition model which maps speech audio waveforms into latent representations. The largest version of wav2vec 2.0 contains 317 million parameters. Hence, the inference latency of wav2vec 2.0 will be a bottleneck in production, leading to high costs and a significant environmental footprint. To improve wav2vec's applicability to a production setting, we explore multiple model compression methods borrowed from the domain of LLMs. Using a teacher-student approach, we distilled the knowledge from the original wav2vec 2.0 model into a student model, which is 2 times faster and 4.8 times smaller than the original model. This increase in performance is accomplished with only a 7% degradation in word error rate (WER). Our quantized model is 3.6 times smaller than the original model, with only a 0.1% degradation in WER. To the best of our knowledge, this is the first work that compresses wav2vec 2.0.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zilun Peng (4 papers)
  2. Akshay Budhkar (4 papers)
  3. Ilana Tuil (1 paper)
  4. Jason Levy (1 paper)
  5. Parinaz Sobhani (6 papers)
  6. Raphael Cohen (7 papers)
  7. Jumana Nassour (2 papers)
Citations (32)

Summary

We haven't generated a summary for this paper yet.