Conditional LSTM-GAN for Melody Generation from Lyrics (1908.05551v2)

Published 15 Aug 2019 in cs.AI, cs.SD, and eess.AS

Abstract: Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables to learn and discover latent relationship between interesting lyrics and accompanying melody. Unfortunately, the limited availability of paired lyrics-melody dataset with alignment information has hindered the research progress. To address this problem, we create a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment through leveraging different music sources where alignment relationship between syllables and music attributes is extracted. Most importantly, we propose a novel deep generative model, conditional Long Short-Term Memory - Generative Adversarial Network (LSTM-GAN) for melody generation from lyrics, which contains a deep LSTM generator and a deep LSTM discriminator both conditioned on lyrics. In particular, lyrics-conditioned melody and alignment relationship between syllables of given lyrics and notes of predicted melody are generated simultaneously. Experimental results have proved the effectiveness of our proposed lyrics-to-melody generative model, where plausible and tuneful sequences can be inferred from lyrics.

Authors (3)

Yi Yu (223 papers)
Abhishek Srivastava (33 papers)
Simon Canales (3 papers)

Citations (122)

View on Semantic Scholar

Summary

Evaluating "Conditional LSTM-GAN for Melody Generation from Lyrics"

The paper on "Conditional LSTM-GAN for Melody Generation from Lyrics," authored by Yi Yu, Abhishek Srivastava, and Simon Canales, presents a novel approach to the automatic composition of music, specifically focusing on generating melodies conditioned on song lyrics. This research is pivotal in advancing the field of music generation, bridging gaps between complex natural language processing and computational music understanding.

Summary of Contributions

The paper introduces a conditional Long Short-Term Memory - Generative Adversarial Network (LSTM-GAN), designed to address the challenge of generating plausible and harmonious melodies from given lyrics. Key elements of this model include:

Large-Scale Dataset Creation: The authors overcome a significant barrier in lyrics-melody alignment research by constructing a comprehensive dataset comprising 12,197 MIDI songs, each having lyrics-melody alignment. This dataset plays a crucial role in training robust machine learning models.
Architecture of Conditional LSTM-GAN: The proposed architecture involves a deep LSTM generator and discriminator, both conditioned on lyrics, aiming to produce realistic melodies. This novel integration of LSTMs within a GAN framework is intended to simultaneously predict melodic sequences and manage the alignment between syllables and notes effectively.
Innovative Embedding and Evaluation Techniques: The authors employ a skip-gram model to encode lyric semantics as word-level and syllable-level embeddings. The paper also suggests metrics for evaluating the generated melodies against a new dataset, showcasing the innovative application of Maximum Mean Discrepancy (MMD) for validation.

Numerical Results and Evaluation

The empirical results demonstrate that the conditional LSTM-GAN model is capable of generating more tuneful sequences than existing methods. The model achieves a significant reduction in task-specific error metrics compared to MLE baseline and random baseline models. Specifically, the paper highlights:

Improved BLEU scores for melody generation, indicating better alignment and melodic consistency with the provided lyrics.
A statistically significant perfection in scale consistency, entailing that the generated melodies align more frequently with standard musical scales.

Implications and Future Research

The implications of this work are noteworthy for both practical applications and theoretical explorations in AI and music composition. Practically, this model lays the groundwork for developing advanced AI-assisted tools for musicians and composers. Theoretically, this research opens new avenues in multimodal machine learning, demonstrating how generative models can be adapted to complex, semantically-driven creative tasks like music composition from lyrics.

Future avenues for exploration, driven by the foundational work in this paper, may include:

Enhancing the generative model's capacity for handling more complex structures like polyphonic music.
Investigating inverse processes, such as generating lyrics from existing melodies.
Extending this framework to include various cultural and linguistic music traditions, further diversifying and enriching music composition capabilities.

The paper on "Conditional LSTM-GAN for Melody Generation from Lyrics" enriches the intersection of AI and music, offering a substantive contribution to automated music composition and generating viable pathways for subsequent research in the field.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos