Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation (2205.08993v1)

Published 18 May 2022 in cs.CL and eess.AS

Abstract: Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (6)

Qianqian Dong (19 papers)
Fengpeng Yue (4 papers)
Tom Ko (31 papers)
Mingxuan Wang (83 papers)
Qibing Bai (6 papers)
Yu Zhang (1400 papers)

Citations (16)

View on Semantic Scholar

GitHub

GitHub - fengpeng-yue/speech-to-speech-translation (26 stars)

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation (2205.08993v1)

Related Papers

GitHub