Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018 (1810.07652v1)

Published 16 Oct 2018 in eess.AS, cs.CL, cs.LG, cs.SD, and stat.ML

Abstract: This paper describes FBK's submission to the end-to-end English-German speech translation task at IWSLT 2018. Our system relies on a state-of-the-art model based on LSTMs and CNNs, where the CNNs are used to reduce the temporal dimension of the audio input, which is in general much higher than machine translation input. Our model was trained only on the audio-to-text parallel data released for the task, and fine-tuned on cleaned subsets of the original training corpus. The addition of weight normalization and label smoothing improved the baseline system by 1.0 BLEU point on our validation set. The final submission also featured checkpoint averaging within a training run and ensemble decoding of models trained during multiple runs. On test data, our best single model obtained a BLEU score of 9.7, while the ensemble obtained a BLEU score of 10.24.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Mattia Antonino Di Gangi (11 papers)
  2. Roberto Dessì (12 papers)
  3. Roldano Cattoni (5 papers)
  4. Matteo Negri (93 papers)
  5. Marco Turchi (51 papers)
Citations (10)