Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers (2211.02809v3)

Published 5 Nov 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. It is thus possible to use a single transducer model to perform both tasks. In real-world applications, such joint ASR and ST models may need to be streaming and do not require source language identification (i.e. language-agnostic). In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers. Based on the transducer model structure, we propose four methods, a unified joint and prediction network for multilingual output, a clustered multilingual encoder, target language identification for encoder, and connectionist temporal classification regularization. Experimental results show that LAMASSU not only drastically reduces the model size but also reaches the performances of monolingual ASR and bilingual ST models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Peidong Wang (33 papers)
  2. Eric Sun (14 papers)
  3. Jian Xue (30 papers)
  4. Yu Wu (196 papers)
  5. Long Zhou (57 papers)
  6. Yashesh Gaur (43 papers)
  7. Shujie Liu (101 papers)
  8. Jinyu Li (164 papers)
Citations (6)