Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems (2005.08742v1)

Published 18 May 2020 in eess.AS, cs.CL, and cs.SD

Abstract: In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance. The underrepresented words correspond to rare or out-of-vocabulary (OOV) words in the training data, and thereby can't be modeled reliably. We begin with graphemic lexicon which allows to drop the necessity of phonetic models in hybrid ASR. We study it under different settings and demonstrate its effectiveness in dealing with underrepresented NEs. Next, we study the impact of neural LLM (LM) with letter-based features derived to handle infrequent words. After that, we attempt to enrich representations of underrepresented NEs in pretrained neural LM by borrowing the embedding representations of rich-represented words. This let us gain significant performance improvement on underrepresented NE recognition. Finally, we boost the likelihood scores of utterances containing NEs in the word lattices rescored by neural LMs and gain further performance improvement. The combination of the aforementioned approaches improves NE recognition by up to 42% relatively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Tingzhi Mao (5 papers)
  2. Yerbolat Khassanov (19 papers)
  3. Van Tung Pham (13 papers)
  4. Haihua Xu (23 papers)
  5. Hao Huang (155 papers)
  6. Eng Siong Chng (112 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.