MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation (1703.10847v2)

Published 31 Mar 2017 in cs.SD and cs.AI

Abstract: Most existing neural network models for music generation use recurrent neural networks. However, the recent WaveNet model proposed by DeepMind shows that convolutional neural networks (CNNs) can also generate realistic musical waveforms in the audio domain. Following this light, we investigate using CNNs for generating melody (a series of MIDI notes) one bar after another in the symbolic domain. In addition to the generator, we use a discriminator to learn the distributions of melodies, making it a generative adversarial network (GAN). Moreover, we propose a novel conditional mechanism to exploit available prior knowledge, so that the model can generate melodies either from scratch, by following a chord sequence, or by conditioning on the melody of previous bars (e.g. a priming melody), among other possibilities. The resulting model, named MidiNet, can be expanded to generate music with multiple MIDI channels (i.e. tracks). We conduct a user study to compare the melody of eight-bar long generated by MidiNet and by Google's MelodyRNN models, each time using the same priming melody. Result shows that MidiNet performs comparably with MelodyRNN models in being realistic and pleasant to listen to, yet MidiNet's melodies are reported to be much more interesting.

Authors (3)

Li-Chia Yang (9 papers)
Szu-Yu Chou (6 papers)
Yi-Hsuan Yang (89 papers)

Citations (447)

View on Semantic Scholar

Summary

The paper introduces a novel CNN-based GAN framework that converts random noise into MIDI note matrices for symbolic music generation.
It leverages a conditional mechanism to incorporate prior musical context, enabling flexible, bar-by-bar composition.
Experimental results show that MidiNet produces more intriguing melodies and offers improved training efficiency compared to RNN-based models.

MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation

MidiNet introduces a convolutional generative adversarial network (GAN) architecture for symbolic-domain music generation. This approach deviates from the prevalent use of recurrent neural networks (RNNs) in music modeling by leveraging convolutional neural networks (CNNs), inspired by the WaveNet model's success in audio-domain generation.

Methodology Overview

MidiNet generates melodies incrementally, bar by bar, through a generator network coupled with a discriminator, creating a GAN framework. The generator CNN transforms random noise into a matrix representing MIDI notes, while the discriminator verifies the authenticity of these generated melodies. A novel conditional mechanism enables the model to utilize prior musical contexts, such as existing chord sequences or preceding melody bars. This adaptability allows MidiNet to produce music from scratch or based on certain predefined conditions.

Comparative Analysis

The paper compares MidiNet against several existing models, including MelodyRNN and WaveNet. MidiNet primarily distinguishes itself by employing CNNs rather than RNNs, showcasing a substantial difference in training efficiency and parallelization capabilities. Table 1 in the paper outlines core differences between MidiNet and other models, noting MidiNet's versatile conditioning capabilities and the use of GANs, which are uncommon in models like MelodyRNN.

Experimental Validation

User studies comparing melodies generated by MidiNet to those by MelodyRNN indicate that while both produce similarly realistic outputs, MidiNet-generated melodies are often perceived as more intriguing. The experiments highlight MidiNet's ability to create eight-bar melodies under various conditions, reinforcing its flexibility and creativity. The research acknowledges both the stability and innovative variety that MidiNet can achieve, positioning it as a viable alternative to traditional RNN approaches.

Implications and Future Directions

MidiNet's introduction of a conditional GAN framework for symbolic music generation represents a significant advancement by enhancing model adaptability to diverse musical inputs. The ability to condition on chords and prior bars opens pathways for more creative and structured compositions.

Further research could explore the application of MidiNet to multi-track music generation, incorporating elements such as note velocity and pauses, which are not currently addressed. Integrating music theory through reinforcement learning and expanding the model's dataset could improve generative accuracy and diversity. Additionally, potential future endeavors could consider more complex conditioning structures, leveraging insights from music information retrieval domains like genre and emotion recognition.

By presenting a novel application of CNNs in GAN configuration, MidiNet broadens the scope of algorithmic composition, proposing a more scalable and potentially dynamic model for symbolic music generation.

PDF Markdown

Related Papers

YouTube

Show All Videos