Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training (2010.10048v2)

Published 20 Oct 2020 in cs.CL and cs.AI

Abstract: Simultaneous speech-to-speech translation is widely useful but extremely challenging, since it needs to generate target-language speech concurrently with the source-language speech, with only a few seconds delay. In addition, it needs to continuously translate a stream of sentences, but all recent solutions merely focus on the single-sentence scenario. As a result, current approaches accumulate latencies progressively when the speaker talks faster, and introduce unnatural pauses when the speaker talks slower. To overcome these issues, we propose Self-Adaptive Translation (SAT) which flexibly adjusts the length of translations to accommodate different source speech rates. At similar levels of translation quality (as measured by BLEU), our method generates more fluent target speech (as measured by the naturalness metric MOS) with substantially lower latency than the baseline, in both Zh <-> En directions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Renjie Zheng (29 papers)
  2. Mingbo Ma (32 papers)
  3. Baigong Zheng (19 papers)
  4. Kaibo Liu (17 papers)
  5. Jiahong Yuan (12 papers)
  6. Kenneth Church (21 papers)
  7. Liang Huang (108 papers)
Citations (14)