TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method (2109.09617v2)

Published 20 Sep 2021 in cs.SD, cs.AI, cs.CL, cs.MM, and eess.AS

Abstract: Lyric-to-melody generation is an important task in automatic songwriting. Previous lyric-to-melody generation systems usually adopt end-to-end models that directly generate melodies from lyrics, which suffer from several issues: 1) lack of paired lyric-melody training data; 2) lack of control on generated melodies. In this paper, we develop TeleMelody, a two-stage lyric-to-melody generation system with music template (e.g., tonality, chord progression, rhythm pattern, and cadence) to bridge the gap between lyrics and melodies (i.e., the system consists of a lyric-to-template module and a template-to-melody module). TeleMelody has two advantages. First, it is data efficient. The template-to-melody module is trained in a self-supervised way (i.e., the source template is extracted from the target melody) that does not need any lyric-melody paired data. The lyric-to-template module is made up of some rules and a lyric-to-rhythm model, which is trained with paired lyric-rhythm data that is easier to obtain than paired lyric-melody data. Second, it is controllable. The design of template ensures that the generated melodies can be controlled by adjusting the musical elements in template. Both subjective and objective experimental evaluations demonstrate that TeleMelody generates melodies with higher quality, better controllability, and less requirement on paired lyric-melody data than previous generation systems.

Authors (10)

Zeqian Ju (13 papers)
Peiling Lu (8 papers)
Xu Tan (164 papers)
Rui Wang (997 papers)
Chen Zhang (404 papers)
Songruoyao Wu (8 papers)
Kejun Zhang (26 papers)
Xiangyang Li (58 papers)
Tao Qin (201 papers)
Tie-Yan Liu (242 papers)

Citations (34)

View on Semantic Scholar

Summary

The paper introduces TeleMelody's template-based two-stage method to decouple lyric-to-melody generation, enhancing quality and user control.
The system achieves data efficiency by using a self-supervised template-to-melody module and offers customizable musical elements for precise adjustment.
Empirical and subjective evaluations demonstrate that TeleMelody outperforms traditional end-to-end models in pitch accuracy, rhythm naturalness, and cross-lingual robustness.

TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method

The paper in discussion introduces "TeleMelody," a novel system designed to facilitate the process of lyric-to-melody generation. The approach is structured as a two-stage process utilizing a template-based methodology, which seeks to address common issues encountered in traditional end-to-end models. This system aims to improve both the quality and controllability of melody generation from lyrics, while minimizing the need for extensive paired training data.

The primary contribution of this research is the introduction of a music template that serves as an intermediary between lyrics and melodies. The template contains crucial musical elements such as tonality, chord progression, rhythm pattern, and cadence, enabling a decomposition of the generation task into two modules: lyric-to-template and template-to-melody. This separation offers several benefits:

Data Efficiency: The template-to-melody module operates in a self-supervised manner by extracting templates directly from melodies. This mitigates the necessity for paired lyric-melody datasets, which are typically resource-intensive to create.
Controllability: Users have the capability to manipulate musical elements in the template, thereby exerting greater control over the generated melodies. The system further enhances control through alignment regularization that uses musical knowledge to guide the template-melody alignment.

Empirical evidence presented in the paper indicates that TeleMelody achieves superior performance compared to existing end-to-end models such as SongMASS. The system was evaluated on objective metrics like pitch and duration distribution similarity, along with melody distance, showing marked improvements. Notably, TeleMelody outperforms the baseline in both English and Chinese datasets, highlighting its robustness across languages.

Furthermore, subjective evaluations conducted with participants reflect advancements in perceived harmony, rhythm naturalness, structural coherence, and overall quality of the melodies produced by TeleMelody. These results corroborate the system's practical efficacy and illustrate the adequacy of the template-based approach in overcoming the limitations of traditional systems, which often lack sufficient paired data and user control.

From a methodological perspective, TeleMelody provides an innovative framework that leverages the strength of decoupling complex tasks into manageable components. By using a musically-informed template, it circumvents the bottlenecks of data dependency and enhances user interactivity. This suggests potential for broader applications, such as adapting this framework to other music generation tasks like melody-to-lyric transformation or accompaniment generation.

In conclusion, TeleMelody presents a significant advancement in the field of automatic songwriting, facilitating efficient and user-adjustable melody creation while reducing reliance on extensive paired data. Future explorations could build upon this groundwork, potentially integrating more expansive datasets, refining component models, or extending the template's scope to accommodate diverse musical styles and emotions.

PDF Markdown

Related Papers

GitHub

TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method - Muzic (3,935 stars)