Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting (2406.12611v1)

Published 18 Jun 2024 in cs.SD, cs.CL, and eess.AS

Abstract: End-to-end multilingual speech recognition models handle multiple languages through a single model, often incorporating language identification to automatically detect the language of incoming speech. Since the common scenario is where the language is already known, these models can perform as language-specific by using language information as prompts, which is particularly beneficial for attention-based encoder-decoder architectures. However, the Connectionist Temporal Classification (CTC) approach, which enhances recognition via joint decoding and multi-task training, does not normally incorporate language prompts due to its conditionally independent output tokens. To overcome this, we introduce an encoder prompting technique within the self-conditioned CTC framework, enabling language-specific adaptation of the CTC model in a zero-shot manner. Our method has shown to significantly reduce errors by 28% on average and by 41% on low-resource languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yosuke Kashiwagi (29 papers)
  2. Hayato Futami (24 papers)
  3. Emiru Tsunoo (34 papers)
  4. Siddhant Arora (50 papers)
  5. Shinji Watanabe (416 papers)