Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
92 tokens/sec
Gemini 2.5 Pro Premium
50 tokens/sec
GPT-5 Medium
22 tokens/sec
GPT-5 High Premium
21 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
459 tokens/sec
Kimi K2 via Groq Premium
230 tokens/sec
2000 character limit reached

Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization (2309.17267v1)

Published 29 Sep 2023 in eess.AS, cs.CL, and cs.SD

Abstract: We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) with focus on diverse rare and out-of-vocabulary (OOV) phrases, such as proper names or terms. The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. Furthermore, we propose injecting two types of ``hard negatives" to the simulated biasing lists in training examples and describe our procedures to automatically mine them. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “Shallow-fusion end-to-end contextual biasing,” in Interspeech, 2019.
  2. “Personalization strategies for end-to-end speech recognition systems,” in ICASSP, 2021.
  3. “Improving contextual recognition of rare words with an alternate spelling prediction model,” in Interspeech, 2022.
  4. “Deep context: End-to-end contextual speech recognition,” 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 418–425, 2018.
  5. “Contextual RNN-T for open domain ASR,” in Interspeech, 2020.
  6. “Context-aware transformer transducer for speech recognition,” in ASRU, 2021.
  7. “Contextual adapters for personalized speech recognition in neural transducers,” in ICASSP 2022, 2022.
  8. “Personalization of end-to-end speech recognition on mobile devices for named entities,” in Automatic Speech Recognition and Understanding Workshop (ASRU), 2019.
  9. “Text-only domain adaptation for end-to-end asr using integrated text-to-mel-spectrogram generator,” in Interspeech, 2023.
  10. “Towards contextual spelling correction for customization of end-to-end speech recognition systems,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 3089–3097, 2022.
  11. “Spellmapper: A non-autoregressive neural spellchecker for asr customization with candidate retrieval based on n-gram mappings,” in Interspeech, 2023.
  12. “Adapting an unadaptable asr system,” in Interspeech, 2023.
  13. “Selective biasing with trie-based contextual adapters for personalised speech recognition using neural transducers,” in Interspeech, 2023.
  14. “Synthetic data generation for grammatical error correction with tagged corruption models,” in Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, Online, Apr. 2021, pp. 37–47, Association for Computational Linguistics.
  15. “Neural models of text normalization for speech applications,” Computational Linguistics, vol. 45, no. 2, pp. 293–337, 2019.
  16. “UserLibri: A dataset for ASR personalization using only text,” in Interspeech, 2022.
  17. “Contextual speech recognition with difficult negative training examples,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6440–6444, 2018.
  18. “Approximate nearest neighbour phrase mining for contextual speech recognition,” 2023.
  19. Adrian Lańcucki, “FastPitch: Parallel text-to-speech with pitch prediction,” in ICASSP, 2021.
  20. “Hifi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis,” in NeurIPS, 2020.
  21. “Fast conformer with linearly scalable attention for efficient speech recognition,” 2023.
  22. “Conformer: Convolution-augmented transformer for speech recognition,” in Interspeech. 2020, ISCA.
  23. “Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions,” ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6124–6128, 2019.
  24. “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning, 2022.
  25. “A systematic comparison of various statistical alignment models,” Computational linguistics, vol. 29, no. 1, pp. 19–51, 2003.
  26. “NeMo: a toolkit for building AI applications using neural modules,” in Systems for ML Workshop, NeurIPS, 2019.
  27. “The Spoken Wikipedia corpus collection: Harvesting, alignment and an application to hyperlistening,” Language Resources and Evaluation, vol. 53, pp. 303–329, 2019.
  28. “SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition,” in Interspeech, 2021.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)