Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting Norwegian Automatic Speech Recognition (2307.01672v1)

Published 4 Jul 2023 in cs.CL

Abstract: In this paper, we present several baselines for automatic speech recognition (ASR) models for the two official written languages in Norway: Bokm{\aa}l and Nynorsk. We compare the performance of models of varying sizes and pre-training approaches on multiple Norwegian speech datasets. Additionally, we measure the performance of these models against previous state-of-the-art ASR models, as well as on out-of-domain datasets. We improve the state of the art on the Norwegian Parliamentary Speech Corpus (NPSC) from a word error rate (WER) of 17.10\% to 7.60\%, with models achieving 5.81\% for Bokm{\aa}l and 11.54\% for Nynorsk. We also discuss the challenges and potential solutions for further improving ASR models for Norwegian.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Maximum likelihood pronunciation modelling of Norwegian natural numbers for automatic speech recognition. In Proc. Norwegian Signal Processing Symposium (NORSIG), pages 145–150.
  2. Ingunn Amdal and Harald Ljøen. 1995. TABU.0 - en norsk telefontaledatabase. Scientific Report, 40:95.
  3. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning, pages 173–182. PMLR.
  4. XLS-R: Self-supervised cross-lingual speech representation learning at scale. arXiv preprint arXiv:2111.09296.
  5. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460.
  6. FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech. 2022 IEEE Spoken Language Technology Workshop (SLT), pages 798–805.
  7. Kenneth Heafield. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the sixth workshop on statistical machine translation, pages 187–197.
  8. European speech databases for telephone applications. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 1771–1774. IEEE.
  9. The Norwegian part of speechdat: A European speech database for creation of voice driven teleservices. Proceedings of NORSIG-1997.
  10. Operationalizing a national digital library: The case for a Norwegian transformer model. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 20–29, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
  11. The Norwegian colossal corpus: A text corpus for training large Norwegian language models. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3852–3860, Marseille, France. European Language Resources Association.
  12. Knut Kvale. 1996. Norwegian numerals: A challenge to automatic speech recognition. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, volume 4, pages 2028–2031. IEEE.
  13. Knut Kvale and Ingunn Amdal. 1997. Improved automatic recognition of Norwegian natural numbers by incorporating phonetic knowledge. 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 3:1763–1766 vol.3.
  14. Norwegian speech recognition for telephone applications. In Proc. Norsig, volume 94, pages 121–125.
  15. Hearing voices at the National Library–a speech corpus and acoustic model for the Swedish language. arXiv preprint arXiv:2205.03026.
  16. Jean-Pierre Martens. 2000. Final report of COST action 249: Continuous speech recognition over the telephone. Technical report, Electronics & Information Systems, Ghent University.
  17. On structuring probabilistic dependences in stochastic language modelling. Computer Speech & Language, 8(1):1–38.
  18. Nordisk Språkteknologi. 2020. NST Norwegian ASR Database (16 kHz) – Reorganized.
  19. Pablo Ortiz and Simen Burud. 2021. BERT attends the conversation: Improving low-resource conversational ASR. arXiv preprint arXiv:2110.02267.
  20. Kuldip K. Paliwal. 1992. On the use of line spectral frequency parameters for speech recognition. Digital signal processing, 2(2):80–87.
  21. Librispeech: an ASR corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5206–5210. IEEE.
  22. Robust speech recognition via large-scale weak supervision.
  23. wav2vec: Unsupervised pre-training for speech recognition. Proc. Interspeech 2019, pages 3465–3469.
  24. Per Erik Solberg and Pablo Ortiz. 2022. The Norwegian parliamentary speech corpus. arXiv preprint arXiv:2201.10881.
  25. An improved sub-word based speech recognizer. In International Conference on Acoustics, Speech, and Signal Processing,, pages 108–111. IEEE.
  26. The National Library of Norway. 2021. Norwegian Parliamentary Speech Corpus.
  27. The HTK hidden Markov model toolkit: Design and philosophy. University of Cambridge, Department of Engineering Cambridge.
Citations (4)

Summary

We haven't generated a summary for this paper yet.