Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation (2407.20955v1)

Published 30 Jul 2024 in cs.SD, cs.AI, and eess.AS

Abstract: Managing the emotional aspect remains a challenge in automatic music generation. Prior works aim to learn various emotions at once, leading to inadequate modeling. This paper explores the disentanglement of emotions in piano performance generation through a two-stage framework. The first stage focuses on valence modeling of lead sheet, and the second stage addresses arousal modeling by introducing performance-level attributes. To further capture features that shape valence, an aspect less explored by previous approaches, we introduce a novel functional representation of symbolic music. This representation aims to capture the emotional impact of major-minor tonality, as well as the interactions among notes, chords, and key signatures. Objective and subjective experiments validate the effectiveness of our framework in both emotional valence and arousal modeling. We further leverage our framework in a novel application of emotional controls, showing a broad potential in emotion-driven music generation.

References (52)

Authors (3)

Jingyue Huang (7 papers)
Ke Chen (241 papers)
Yi-Hsuan Yang (89 papers)

Citations (1)

View on Semantic Scholar

Summary

Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation

The paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation" addresses key challenges in the domain of automatic music generation, particularly focusing on the emotional aspect of piano music. The authors propose a two-stage framework to disentangle and model emotional valence and arousal independently, laying a foundation for more nuanced and expressive music generation systems.

Framework and Methodology

The proposed methodology begins with a two-stage process. The first stage centers on valence modeling, executed through lead sheet composition. Here, the system aims to capture the positiveness or negativeness of the emotion by generating the underlying harmonic and melodic skeleton of the music. This involves predicting key events and generating lead sheet sequences based on valence conditions.

In the second stage, arousal modeling is conducted by focusing on the performance level, which includes attributes such as tempo, dynamics, and articulation. This approach manages the energy or activation levels in music, achieving a deeper and more expressive rendering of the emotional content suggested by the lead sheet.

A novel functional representation is introduced to support this dual-stage framework. This representation is key to capturing the intricate interactions between notes, chords, and the overall key signature, which are critical for modeling tonality—a fundamental element linked with emotional valence. Utilizing a Roman numeral approach for chord notations ensures a more context-sensitive adaptation across varying key signatures, thereby enhancing the model’s capability to align musical structure with intended emotional outcomes.

Key Findings and Results

The paper reports on both objective and subjective evaluations to gauge the efficacy of their approach. Key consistency metrics were used to objectively measure how well the generated music matches the expected key signatures. This consistency is vital for maintaining musical coherence, and their methods showed significant improvements over previous representations such as REMI.

Subjectively, the authors conducted comprehensive listener tests to evaluate the emotional quality of generated music across the four quadrants of valence-arousal space. The proposed functional representation and two-stage framework outperform existing models, notably enhancing the separation and clarity of valence-driven and arousal-driven emotional cues in music.

Implications and Future Directions

Practically, this research presents a significant step forward in creating music that is not only musically coherent but also emotionally compelling. Such advancements have possible applications in areas such as music therapy, AI-driven soundtrack composition, and interactive media where emotional nuance is critical.

Theoretically, the paper underscores the importance of considering functional harmony and key-dependent musical relationships in generative models. This can spur future research aimed at exploring the emotional depth in other musical forms and traditions, potentially expanding the applicability of these methods across genres and cultural contexts.

Future work could delve further into the flexibility of emotion-driven music generation, striving for more diverse emotional expressions within any given key. Moreover, exploring how these models can learn from additional large-scale datasets and yield real-time applications would be a valuable extension of this research.

In conclusion, this paper lays out a compelling framework for disentangled emotional modeling in music generation. Its innovative use of functional representation paired with a structured, two-stage process addresses previous limitations and sets a robust foundation for future advancements in emotion-aware AI music systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/Yuer867/status/1824511720796328065

https://twitter.com/_AbrahamMathews/status/1818692054496452861

YouTube

Show All Videos