Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation (2305.15403v1)

Published 24 May 2023 in cs.CL, cs.SD, and eess.AS

Abstract: Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date. Despite the recent success, current S2ST models still suffer from distinct degradation in noisy environments and fail to translate visual speech (i.e., the movement of lips and teeth). In this work, we present AV-TranSpeech, the first audio-visual speech-to-speech (AV-S2ST) translation model without relying on intermediate text. AV-TranSpeech complements the audio stream with visual information to promote system robustness and opens up a host of practical applications: dictation or dubbing archival films. To mitigate the data scarcity with limited parallel AV-S2ST data, we 1) explore self-supervised pre-training with unlabeled audio-visual data to learn contextual representation, and 2) introduce cross-modal distillation with S2ST models trained on the audio-only corpus to further reduce the requirements of visual data. Experimental results on two language pairs demonstrate that AV-TranSpeech outperforms audio-only models under all settings regardless of the type of noise. With low-resource audio-visual data (10h, 30h), cross-modal distillation yields an improvement of 7.6 BLEU on average compared with baselines. Audio samples are available at https://AV-TranSpeech.github.io

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Rongjie Huang (62 papers)
  2. Huadai Liu (14 papers)
  3. Xize Cheng (29 papers)
  4. Yi Ren (215 papers)
  5. Linjun Li (43 papers)
  6. Zhenhui Ye (25 papers)
  7. Lichao Zhang (17 papers)
  8. Jinglin Liu (38 papers)
  9. Xiang Yin (99 papers)
  10. Zhou Zhao (218 papers)
  11. JinZheng He (22 papers)
Citations (7)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub