2000 character limit reached
MELD-ST: An Emotion-aware Speech Translation Dataset (2405.13233v1)
Published 21 May 2024 in cs.CL
Abstract: Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings, highlighting the need for further research in emotion-aware speech translation systems.
- FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023).
- Findings of the IWSLT 2022 evaluation campaign. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022).
- Gender in danger? evaluating speech translation technology on the MuST-SHE corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
- GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10,000 Hours of Transcribed Audio. In Proc. Interspeech 2021.
- MuST-C: a Multilingual Speech Translation Corpus. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
- Emotion Recognition in Conversations: A Survey Focusing on Context, Speaker Dependencies, and Fusion Methods. Electronics.
- Breeding gender-aware direct speech translation systems. In Proceedings of the 28th International Conference on Computational Linguistics.
- CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. In Proceedings of Language Resources and Evaluation Conference (LREC).
- MuST-cinema: a speech-to-subtitles corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference.
- CTC-Segmentation of Large Corpora for German End-to-End Speech Recognition. In Speech and Computer.
- Direct speech-to-speech translation with discrete units. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
- EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.
- MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
- Robust speech recognition via large-scale weak supervision.
- AudioPaLM: A Large Language Model That Can Speak and Listen.
- SeamlessM4T: Massively Multilingual & Multimodal Machine Translation.
- Seamless: Multilingual Expressive and Streaming Speech Translation.
- BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
- Towards speech dialogue translation mediating speakers of different languages. In Findings of the Association for Computational Linguistics: ACL 2023.
- Lost in back-translation: Emotion preservation in neural machine translation. In Proceedings of the 28th International Conference on Computational Linguistics.
- CoVoST 2 and Massively Multilingual Speech Translation. In Proc. Interspeech 2021.
- Dialogs re-enacted across languages.
- ESPnet: End-to-End Speech Processing Toolkit. In Proc. Interspeech 2018.
- GigaST: A 10,000-hour Pseudo Speech Translation Corpus. In Proc. INTERSPEECH 2023.