Papers
Topics
Authors
Recent
2000 character limit reached

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences (2309.12712v1)

Published 22 Sep 2023 in eess.AS, cs.LG, and cs.SD

Abstract: Recent progress in Automatic Speech Recognition (ASR) has been coupled with a substantial increase in the model sizes, which may now contain billions of parameters, leading to slow inferences even with adapted hardware. In this context, several ASR models exist in various sizes, with different inference costs leading to different performance levels. Based on the observation that smaller models perform optimally on large parts of testing corpora, we propose to train a decision module, that would allow, given an audio sample, to use the smallest sufficient model leading to a good transcription. We apply our approach to two Whisper models with different sizes. By keeping the decision process computationally efficient, we build a decision module that allows substantial computational savings with reduced performance drops.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. “Robust speech recognition via large-scale weak supervision,” 2022.
  2. “Fast conformer with linearly scalable attention for efficient speech recognition,” 2023.
  3. “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” CoRR, vol. abs/2106.07447, 2021.
  4. “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” CoRR, vol. abs/2110.13900, 2021.
  5. “Common voice: A massively-multilingual speech corpus,” 2020.
  6. “Word error rate estimation without asr output: e-wer2,” 2020.
  7. “Multilingual word error rate estimation: e-wer3,” 2023.
  8. “Fine-tuning strategies for faster inference using speech self-supervised models: A comparative study,” 2023.
  9. “Hubert-ee: Early exiting hubert for efficient speech recognition,” 2022.
  10. “Surprisal-triggered conditional computation with neural networks,” 2020.
  11. “Speechbrain: A general-purpose speech toolkit,” arXiv preprint arXiv:2106.04624, 2021.
  12. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” 2020.
  13. “Unsupervised pretraining transfers well across languages,” 2020.
  14. “Probing phoneme, language and speaker information in unsupervised speech representations,” in Interspeech 2022. sep 2022, ISCA.
  15. “Whisper-at: Noise-robust automatic speech recognizers are also strong general audio event taggers,” 2023.
  16. “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015.
  17. “SUPERB: speech processing universal performance benchmark,” CoRR, vol. abs/2105.01051, 2021.
  18. “Speech self-supervised representations benchmarking: a case for larger probing heads,” arXiv preprint arXiv:2308.14456, 2023.
  19. “Librispeech: An asr corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210.
  20. “Adam: A method for stochastic optimization,” 2017.
  21. “Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and c50 room acoustics estimation,” 2023.
  22. “Commonaccent: Exploring large acoustic pretrained models for accent classification based on common voice,” 2023.
  23. “Hey ASR system! why aren’t you more inclusive?,” in Lecture Notes in Computer Science, pp. 421–440. Springer Nature Switzerland, 2022.
  24. “Conformer: Convolution-augmented transformer for speech recognition,” 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.