Self-Supervised Representations Improve End-to-End Speech Translation (2006.12124v2)

Published 22 Jun 2020 in eess.AS, cs.CL, and cs.SD

Abstract: End-to-end speech-to-text translation can provide a simpler and smaller system but is facing the challenge of data scarcity. Pre-training methods can leverage unlabeled data and have been shown to be effective on data-scarce settings. In this work, we explore whether self-supervised pre-trained speech representations can benefit the speech translation task in both high- and low-resource settings, whether they can transfer well to other languages, and whether they can be effectively combined with other common methods that help improve low-resource end-to-end speech translation such as using a pre-trained high-resource speech recognition system. We demonstrate that self-supervised pre-trained features can consistently improve the translation performance, and cross-lingual transfer allows to extend to a variety of languages without or with little tuning.

View on arXiv

Authors (4)

Anne Wu (11 papers)
Changhan Wang (46 papers)
Juan Pino (51 papers)
Jiatao Gu (84 papers)

Citations (39)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Self-Supervised Representations Improve End-to-End Speech Translation (2006.12124v2)

Summary

Related Papers