Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge (2404.06079v2)

Published 9 Apr 2024 in eess.AS and cs.AI

Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. Notably, we achieved 1st rank on the leaderboard in the TTS track both with the whole training set and only 1h training data, with the highest UTMOS score and lowest bitrate among all submissions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Yiwei Guo (30 papers)
  2. Chenrun Wang (2 papers)
  3. Yifan Yang (578 papers)
  4. Hankun Wang (12 papers)
  5. Ziyang Ma (73 papers)
  6. Chenpeng Du (28 papers)
  7. Shuai Wang (466 papers)
  8. Hanzheng Li (1 paper)
  9. Shuai Fan (17 papers)
  10. Hui Zhang (405 papers)
  11. Xie Chen (166 papers)
  12. Kai Yu (202 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.