Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm (2008.01291v1)

Published 4 Aug 2020 in cs.LG, cs.MM, cs.SD, eess.AS, and stat.ML

Abstract: Drawing an analogy with automatic image completion systems, we propose Music SketchNet, a neural network framework that allows users to specify partial musical ideas guiding automatic music generation. We focus on generating the missing measures in incomplete monophonic musical pieces, conditioned on surrounding context, and optionally guided by user-specified pitch and rhythm snippets. First, we introduce SketchVAE, a novel variational autoencoder that explicitly factorizes rhythm and pitch contour to form the basis of our proposed model. Then we introduce two discriminative architectures, SketchInpainter and SketchConnector, that in conjunction perform the guided music completion, filling in representations for the missing measures conditioned on surrounding context and user-specified snippets. We evaluate SketchNet on a standard dataset of Irish folk music and compare with models from recent works. When used for music completion, our approach outperforms the state-of-the-art both in terms of objective metrics and subjective listening tests. Finally, we demonstrate that our model can successfully incorporate user-specified snippets during the generation process.

Citations (49)

View on Semantic Scholar

Summary

The paper introduces SketchVAE, a novel variational autoencoder that factorizes musical inputs into independent pitch and rhythm dimensions for enhanced user control.
It integrates discriminative models, SketchInpainter and SketchConnector, to predict and adjust incomplete musical measures using contextual cues and optional user inputs.
Empirical evaluations on an Irish folk music dataset show lower loss and higher pitch/rhythm accuracy compared to existing methods, validating its practical effectiveness.

Overview of "Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm"

The paper "Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm" introduces Music SketchNet, a sophisticated neural network framework for interactive music generation. By enabling users to specify partial musical ideas, the model addresses a crucial facet of automatic music generation: enhanced user control and interaction. Unlike prior work on conditional music generation such as MuseGan, DeepBach, and Music Transformer, Music SketchNet provides a finer level of control by allowing users to input incomplete pitch and rhythm ideas.

Core Contributions

SketchVAE Architecture: At its core, Music SketchNet employs SketchVAE, a novel variational autoencoder designed to factorize musical inputs into separate dimensions for pitch and rhythm. This factorization grants users the ability to manipulate these dimensions independently, fostering more intuitive and flexible user interactions compared to previous models like EC²-VAE.
Discriminative Models for Contextual Completion: The framework incorporates SketchInpainter and SketchConnector, forming unique generative architectures for guided music completion. Remarkably, SketchInpainter predicts missing measures based on surrounding context, translating encoded musical ideas into latent variables, while SketchConnector further adjusts these predictions by integrating optional user inputs using a transformer-based mechanism.
Empirical Evaluation: Tested on the Irish folk music dataset, SketchNet consistently outperforms existing methods such as Music InpaintNet in both objective metrics—where it achieved lower loss and higher pitch/rhythm accuracy—and subjective listening tests. These evaluations underscore its efficacy in generating coherent musical sequences that align with user-specified constraints.

Practical and Theoretical Implications

The development of Music SketchNet holds considerable promise for future advancements in AI-driven music composition. The explicit decoupling of pitch and rhythm factors not only enhances control during the generation process but also opens avenues for developing more refined generative models capable of handling complex musical elements beyond monophonic melodies. This could pave the way for sophisticated applications in polyphonic music generation or nuanced expressive composition tools for musicians.

From a theoretical perspective, the work extends traditional ideas of latent space manipulation in sequential generative models, demonstrating how factorizing latent representations can lead to improved user interaction mechanisms and more expressive generative outcomes. Future research could explore further decomposition of musical attributes or expand the control modalities employed by users.

Conclusive Remarks

This paper outlines a significant innovation in music generation technology, balancing theoretical advancements with practical applications. By mediating user control through factorized representation, Music SketchNet offers an exciting glimpse into the future of AI-driven creative processes in music. As AI continues to evolve, frameworks like Music SketchNet will be pivotal in shaping how musicians interact with and harness the capabilities of machine learning in their creative endeavors.

PDF Markdown

Related Papers

YouTube

Show All Videos