Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation (2310.14806v1)

Published 23 Oct 2023 in cs.CL, cs.SD, and eess.AS

Abstract: The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditional approaches to automatic speech recognition (ASR) and speech translation (ST) have often relied on separate systems, leading to inefficiencies in computational resources, and increased synchronization complexity in real time. In this paper, we propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder. We introduce a novel method for joint token-level serialized output training based on timestamp information to effectively produce ASR and ST outputs in the streaming setting. Experiments on {it,es,de}->en prove the effectiveness of our approach, enabling the generation of one-to-many joint outputs with a single decoder for the first time.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Sara Papi (33 papers)
  2. Peidong Wang (33 papers)
  3. Junkun Chen (27 papers)
  4. Jian Xue (30 papers)
  5. Naoyuki Kanda (61 papers)
  6. Jinyu Li (164 papers)
  7. Yashesh Gaur (43 papers)
Citations (2)