SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models (2406.08905v2)

Published 13 Jun 2024 in cs.SD and eess.AS

Abstract: Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation than typical speech. To address these challenges, we introduce SingOMD, a novel method to extract singing-oriented multi-resolution discrete representations from speech SSL models. Specifically, we first adapt the features from speech SSL through a resynthesis task and incorporate multi-resolution modules based on resampling to better serve singing generation. These adapted multi-resolution features are then discretized via clustering. Extensive experiments demonstrate the robustness, efficiency, and effectiveness of these representations in singing vocoders and singing voice synthesis.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

Authors (4)

Yuxun Tang (13 papers)
Yuning Wu (20 papers)
Jiatong Shi (82 papers)
Qin Jin (94 papers)

Citations (3)

View on Semantic Scholar

Tweets

https://twitter.com/AudioAndSpeech/status/1804369514529800612

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models (2406.08905v2)

Related Papers

Tweets