Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units (2406.07725v1)

Published 11 Jun 2024 in cs.SD and eess.AS

Abstract: Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compression and restoration, speech recognition, and speech generation. To foster exploration in this domain, we introduce the Interspeech 2024 Challenge, which focuses on new speech processing benchmarks using discrete units. It encompasses three pivotal tasks, namely multilingual automatic speech recognition, text-to-speech, and singing voice synthesis, and aims to assess the potential applicability of discrete units in these tasks. This paper outlines the challenge designs and baseline descriptions. We also collate baseline and selected submission systems, along with preliminary findings, offering valuable contributions to future research in this evolving field.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Xuankai Chang (61 papers)
  2. Jiatong Shi (82 papers)
  3. Jinchuan Tian (33 papers)
  4. Yuning Wu (20 papers)
  5. Yuxun Tang (13 papers)
  6. Yihan Wu (45 papers)
  7. Shinji Watanabe (419 papers)
  8. Yossi Adi (96 papers)
  9. Xie Chen (166 papers)
  10. Qin Jin (94 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.