Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation (2210.17027v1)

Published 31 Oct 2022 in cs.SD, cs.CL, and eess.AS

Abstract: Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST. However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare. To address this issue, we propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks. By effectively leveraging the paired text data, Speech2S is capable of modeling the cross-lingual speech conversion from source to target language. We verify the performance of the proposed Speech2S on Europarl-ST and VoxPopuli datasets. Experimental results demonstrate that Speech2S gets an improvement of about 5 BLEU scores compared to encoder-only pre-training models, and achieves a competitive or even better performance than existing state-of-the-art models1.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Kun Wei (23 papers)
  2. Long Zhou (57 papers)
  3. Ziqiang Zhang (11 papers)
  4. Liping Chen (21 papers)
  5. Shujie Liu (101 papers)
  6. Lei He (121 papers)
  7. Jinyu Li (164 papers)
  8. Furu Wei (291 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.