2000 character limit reached
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge (2404.06079v2)
Published 9 Apr 2024 in eess.AS and cs.AI
Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. Notably, we achieved 1st rank on the leaderboard in the TTS track both with the whole training set and only 1h training data, with the highest UTMOS score and lowest bitrate among all submissions.
- Yiwei Guo (30 papers)
- Chenrun Wang (2 papers)
- Yifan Yang (578 papers)
- Hankun Wang (12 papers)
- Ziyang Ma (73 papers)
- Chenpeng Du (28 papers)
- Shuai Wang (466 papers)
- Hanzheng Li (1 paper)
- Shuai Fan (17 papers)
- Hui Zhang (405 papers)
- Xie Chen (166 papers)
- Kai Yu (202 papers)