Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Developing Acoustic Models for Automatic Speech Recognition in Swedish (2404.16547v1)

Published 25 Apr 2024 in eess.AS, cs.AI, and cs.SD

Abstract: This paper is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified task (digits and natural number recognition) has been considered for model evaluation. Different kinds of phone models have been tested, including context independent models and two variations of context dependent models. Furthermore many experiments have been done with bigram LLMs to tune some of the system parameters. System performance over various speaker subsets with different sex, age and dialect has also been examined. Results are compared to previous similar studies showing a remarkable improvement.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. TVIT project. Speech, Music and Hearing department, KTH, Sweden.
  2. Test set definition and specification. Technical Report LE2-4001-SD1.3.4, Consortium and CEC, dec 1997.
  3. Robert Edward Donovan. Trainable Speech Synthesis. PhD thesis, Cambridge University Engineering Department, Trumpington Street Cambridge CB2 1PZ England, 1996.
  4. Gunnar Fant. Speech Sounds and Features. The MIT Press Cambridge, Massachusetts and London, England, 1973.
  5. The august spoken dialogue system. In Proceedings of EuroSpeech, 1999.
  6. The norwegian part of speechdat: A european speech database for creation of voice driven teleservices. In NORSIG, 1997.
  7. Discrete-Time Processing od Speech Signals. Macmillian Publishing Company, 866 Third Avenue, New York, New York 10022, 1993.
  8. Håkan Melin. On word boundary detection in digital-based speaker verification. In La Reconnaissance du Locuteur et ses Applications Commerciales et Criminalistiques, pages 46–49, 1998.
  9. Kåre Sjölander. Continuous speech recognition with hidden markov models. Master’s thesis, Kungliga Tekniska Högskolan Department of Speech, Music and Hearing, Drottning Kristinas väg 31 100 44 Stockholm, 1996.
  10. The HTK Book. Entropic Cambridge University Laboratory, dec 1997.
Citations (8)

Summary

We haven't generated a summary for this paper yet.