WACO: Word-Aligned Contrastive Learning for Speech Translation (2212.09359v3)

Published 19 Dec 2022 in cs.CL, cs.SD, and eess.AS

Abstract: End-to-end Speech Translation (E2E ST) aims to directly translate source speech into target text. Existing ST methods perform poorly when only extremely small speech-text data are available for training. We observe that an ST model's performance closely correlates with its embedding similarity between speech and source transcript. In this paper, we propose Word-Aligned COntrastive learning (WACO), a simple and effective method for extremely low-resource speech-to-text translation. Our key idea is bridging word-level representations for both speech and text modalities via contrastive learning. We evaluate WACO and other methods on the MuST-C dataset, a widely used ST benchmark, and on a low-resource direction Maltese-English from IWSLT 2023. Our experiments demonstrate that WACO outperforms the best baseline by 9+ BLEU points with only 1-hour parallel ST data. Code is available at https://github.com/owaski/WACO.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (52)

Authors (3)

Siqi Ouyang (15 papers)
Rong Ye (20 papers)
Lei Li (1293 papers)

Citations (22)

View on Semantic Scholar

GitHub

GitHub - owaski/WACO: Word-Aligned Contrastive Learning for Speech Translation (4 stars)

WACO: Word-Aligned Contrastive Learning for Speech Translation (2212.09359v3)

Related Papers

GitHub