Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models (2406.08905v2)

Published 13 Jun 2024 in cs.SD and eess.AS

Abstract: Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation than typical speech. To address these challenges, we introduce SingOMD, a novel method to extract singing-oriented multi-resolution discrete representations from speech SSL models. Specifically, we first adapt the features from speech SSL through a resynthesis task and incorporate multi-resolution modules based on resampling to better serve singing generation. These adapted multi-resolution features are then discretized via clustering. Extensive experiments demonstrate the robustness, efficiency, and effectiveness of these representations in singing vocoders and singing voice synthesis.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yuxun Tang (13 papers)
  2. Yuning Wu (20 papers)
  3. Jiatong Shi (82 papers)
  4. Qin Jin (94 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com