Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (2403.11780v2)

Published 18 Mar 2024 in cs.SD, cs.AI, cs.LG, and eess.AS

Abstract: Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while keeping melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at http://prompt-singer.github.io .

References (41)

Authors (9)

Yongqi Wang (24 papers)
Ruofan Hu (6 papers)
Rongjie Huang (62 papers)
Zhiqing Hong (13 papers)
Ruiqi Li (44 papers)
Wenrui Liu (11 papers)
Fuming You (6 papers)
Tao Jin (53 papers)
Zhou Zhao (219 papers)

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

Prompt-Singer

Tweets

https://twitter.com/AudioAndSpeech/status/1876818868494098759

https://twitter.com/AudioAndSpeech/status/1770117452422131822

https://twitter.com/stateof_ai/status/1810955064598515757

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (2403.11780v2)

Summary

Related Papers

GitHub

Tweets