2000 character limit reached
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation (2108.03845v1)
Published 9 Aug 2021 in cs.CL
Abstract: This paper describes our work in participation of the IWSLT-2021 offline speech translation task. Our system was built in a cascade form, including a speaker diarization module, an Automatic Speech Recognition (ASR) module and a Machine Translation (MT) module. We directly use the LIUM SpkDiarization tool as the diarization module. The ASR module is trained with three ASR datasets from different sources, by multi-source training, using a modified Transformer encoder. The MT module is pretrained on the large-scale WMT news translation dataset and fine-tuned on the TED corpus. Our method achieves 24.6 BLEU score on the 2021 test set.
- Minghan Wang (23 papers)
- Yuxia Wang (41 papers)
- Chang Su (37 papers)
- Jiaxin Guo (40 papers)
- Yingtao Zhang (19 papers)
- Yujia Liu (27 papers)
- Min Zhang (630 papers)
- Shimin Tao (31 papers)
- Xingshan Zeng (38 papers)
- Liangyou Li (36 papers)
- Hao Yang (328 papers)
- Ying Qin (51 papers)