Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting (2210.09510v3)

Published 18 Oct 2022 in cs.CL, cs.SD, and eess.AS

Abstract: End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previous time steps to influence future predictions. To tackle this, we propose a novel two-way approach that first biases the encoder with attention over a predefined list of rare long-tail and out-of-vocabulary (OOV) words and then uses dynamic boosting and phone alignment network during decoding to further bias the subword predictions. We evaluate our approach on open-source VoxPopuli and in-house medical datasets to showcase a 60% improvement in F1 score on domain-specific rare words over a strong CTC baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Saket Dingliwal (22 papers)
  2. Monica Sunkara (20 papers)
  3. Sravan Bodapati (31 papers)
  4. Srikanth Ronanki (23 papers)
  5. Jeff Farris (3 papers)
  6. Katrin Kirchhoff (36 papers)