Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation (2205.08993v1)

Published 18 May 2022 in cs.CL and eess.AS

Abstract: Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Qianqian Dong (19 papers)
  2. Fengpeng Yue (4 papers)
  3. Tom Ko (31 papers)
  4. Mingxuan Wang (83 papers)
  5. Qibing Bai (6 papers)
  6. Yu Zhang (1400 papers)
Citations (16)
Github Logo Streamline Icon: https://streamlinehq.com