SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition (2402.17645v2)

Published 27 Feb 2024 in cs.SD, cs.AI, cs.CL, and eess.AS

Abstract: Creating lyrics and melodies for the vocal track in a symbolic format, known as song composition, demands expert musical knowledge of melody, an advanced understanding of lyrics, and precise alignment between them. Despite achievements in sub-tasks such as lyric generation, lyric-to-melody, and melody-to-lyric, etc, a unified model for song composition has not yet been achieved. In this paper, we introduce SongComposer, a pioneering step towards a unified song composition model that can readily create symbolic lyrics and melodies following instructions. SongComposer is a music-specialized LLM that, for the first time, integrates the capability of simultaneously composing lyrics and melodies into LLMs by leveraging three key innovations: 1) a flexible tuple format for word-level alignment of lyrics and melodies, 2) an extended tokenizer vocabulary for song notes, with scalar initialization based on musical knowledge to capture rhythm, and 3) a multi-stage pipeline that captures musical structure, starting with motif-level melody patterns and progressing to phrase-level structure for improved coherence. Extensive experiments demonstrate that SongComposer outperforms advanced LLMs, including GPT-4, in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation. Moreover, we will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.

References (39)

Citations (14)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition (2402.17645v2)

Summary

Related Papers