Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability (2211.02499v2)

Published 4 Nov 2022 in cs.CL, cs.AI, cs.SD, and eess.AS

Abstract: In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language. The backbone of SM2 is Transformer Transducer, which has high streaming capability. Instead of human labeled speech translation (ST) data, SM2 models are trained using weakly supervised data generated by converting the transcriptions in speech recognition corpora with a machine translation service. With 351 thousand hours of anonymized speech training data from 25 languages, SM2 models achieve comparable or even better ST quality than some recent popular large-scale non-streaming speech models. More importantly, we show that SM2 has the truly zero-shot capability when expanding to new target languages, yielding high quality ST results for {source-speech, target-text} pairs that are not seen during training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jian Xue (30 papers)
  2. Peidong Wang (33 papers)
  3. Jinyu Li (164 papers)
  4. Eric Sun (14 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.