Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation (2311.00697v1)

Published 1 Nov 2023 in cs.CL and eess.AS

Abstract: Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combines automatic speech recognition, speech translation and speaker turn detection using special tokens in a serialized labeling format. We run experiments on the Fisher-CALLHOME corpus, which we adapted by merging the two single-speaker channels into one multi-speaker channel, thus representing the more realistic and challenging scenario with multi-speaker turns and cross-talk. Experimental results across single- and multi-speaker conditions and against conventional ST systems, show that our model outperforms the reference systems on the multi-speaker condition, while attaining comparable performance on the single-speaker condition. We release scripts for data processing and model training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Juan Zuluaga-Gomez (27 papers)
  2. Zhaocheng Huang (3 papers)
  3. Xing Niu (28 papers)
  4. Rohit Paturi (9 papers)
  5. Sundararajan Srinivasan (16 papers)
  6. Prashant Mathur (21 papers)
  7. Brian Thompson (24 papers)
  8. Marcello Federico (38 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.