Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer (2306.04076v1)

Published 7 Jun 2023 in cs.CL, cs.SD, and eess.AS

Abstract: Domain adaptation using text-only corpus is challenging in end-to-end(E2E) speech recognition. Adaptation by synthesizing audio from text through TTS is resource-consuming. We present a method to learn Unified Speech-Text Representation in Conformer Transducer(USTR-CT) to enable fast domain adaptation using the text-only corpus. Different from the previous textogram method, an extra text encoder is introduced in our work to learn text representation and is removed during inference, so there is no modification for online deployment. To improve the efficiency of adaptation, single-step and multi-step adaptations are also explored. The experiments on adapting LibriSpeech to SPGISpeech show the proposed method reduces the word error rate(WER) by relatively 44% on the target domain, which is better than those of TTS method and textogram method. Also, it is shown the proposed method can be combined with internal LLM estimation(ILME) to further improve the performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Lu Huang (30 papers)
  2. Boyu Li (59 papers)
  3. Jun Zhang (1008 papers)
  4. Lu Lu (189 papers)
  5. Zejun Ma (78 papers)
Citations (2)