Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems (2003.09024v1)

Published 19 Mar 2020 in cs.CL and cs.LG

Abstract: The problem of out of vocabulary words (OOV) is typical for any speech recognition system, hybrid systems are usually constructed to recognize a fixed set of words and rarely can include all the words that will be encountered during exploitation of the system. One of the popular approach to cover OOVs is to use subword units rather then words. Such system can potentially recognize any previously unseen word if the word can be constructed from present subword units, but also non-existing words can be recognized. The other popular approach is to modify HMM part of the system so that it can be easily and effectively expanded with custom set of words we want to add to the system. In this paper we explore different existing methods of this solution on both graph construction and search method levels. We also present a novel vocabulary expansion techniques which solve some common internal subroutine problems regarding recognition graph processing.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Nikolay Malkovsky (2 papers)
  2. Vladimir Bataev (14 papers)
  3. Dmitrii Sviridkin (1 paper)
  4. Natalia Kizhaeva (1 paper)
  5. Aleksandr Laptev (14 papers)
  6. Ildar Valiev (1 paper)
  7. Oleg Petrov (4 papers)
Citations (1)