Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation (2207.00952v1)

Published 3 Jul 2022 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: End-to-end speech-to-text translation models are often initialized with pre-trained speech encoder and pre-trained text decoder. This leads to a significant training gap between pre-training and fine-tuning, largely due to the modality differences between speech outputs from the encoder and text inputs to the decoder. In this work, we aim to bridge the modality gap between speech and text to improve translation quality. We propose M-Adapter, a novel Transformer-based module, to adapt speech representations to text. While shrinking the speech sequence, M-Adapter produces features desired for speech-to-text translation via modelling global and local dependencies of a speech sequence. Our experimental results show that our model outperforms a strong baseline by up to 1 BLEU score on the Must-C En$\rightarrow$DE dataset.\footnote{Our code is available at https://github.com/mingzi151/w2v2-st.}

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jinming Zhao (26 papers)
  2. Hao Yang (328 papers)
  3. Ehsan Shareghi (54 papers)
  4. Gholamreza Haffari (141 papers)
Citations (19)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub