Counterpoint by Convolution (1903.07227v1)

Published 18 Mar 2019 in cs.LG, cs.SD, eess.AS, and stat.ML

Abstract: Machine learning models of music typically break up the task of composition into a chronological process, composing a piece of music in a single pass from beginning to end. On the contrary, human composers write music in a nonlinear fashion, scribbling motifs here and there, often revisiting choices previously made. In order to better approximate this process, we train a convolutional neural network to complete partial musical scores, and explore the use of blocked Gibbs sampling as an analogue to rewriting. Neither the model nor the generative procedure are tied to a particular causal direction of composition. Our model is an instance of orderless NADE (Uria et al., 2014), which allows more direct ancestral sampling. However, we find that Gibbs sampling greatly improves sample quality, which we demonstrate to be due to some conditional distributions being poorly modeled. Moreover, we show that even the cheap approximate blocked Gibbs procedure from Yao et al. (2014) yields better samples than ancestral sampling, based on both log-likelihood and human evaluation.

Citations (143)

View on Semantic Scholar

Summary

The paper introduces Coconet, a CNN that leverages convolutional modeling to generate polyphonic music with effective counterpoint synthesis.
It employs orderless NADE and blocked Gibbs sampling to enhance sample quality and capture complex musical structures beyond traditional sequential approaches.
Quantitative and qualitative evaluations demonstrate that Coconet outperforms existing models in perceptual quality and task versatility, paving new avenues for AI-driven composition.

Analysis of "Counterpoint by Convolution"

The paper "Counterpoint by Convolution" explores an innovative approach to generating polyphonic music with a convolutional neural network (CNN), specifically targeting the task of counterpoint in music composition. This paper provides a compelling examination of alternative methods for computational music generation, focusing on the flexibility of convolutional architectures for score completion tasks. The authors present Coconet, a deep convolutional model designed to reconstruct incomplete musical scores, challenging the traditional sequential models used in music generation.

Summary of Key Contributions

CNN for Polyphonic Music Generation: The authors propose using CNNs to model music due to their ability to capture local structures with translation invariance, both in time and pitch space. Coconet, their presented model, bypasses traditional sequence models (like RNNs or HMMs) that operate in a unidirectional and temporal fashion, aiming instead to reflect the human compositional process, which often revisits and refines previous musical decisions.
Orderless NADE and Blocked Gibbs Sampling: Coconet is structured as an instance of the orderless Neural Autoregressive Distribution Estimator (NADE), which allows it to model different orderings when predicting musical notes. Notably, the paper asserts that Gibbs sampling significantly enhances the sample quality, despite NADE’s capacity for independent sampling orders. This finding contrasts with conventional wisdom around sampling methods, by demonstrating that blocked Gibbs sampling, even when using an approximate approach, can yield superior samples.
Quantitative and Qualitative Evaluations: The authors provide a comprehensive evaluation of their model, comparing likelihood performance across different datasets and temporal resolutions—an uncommon depth of analysis in music generation studies. Not only does Coconet outperform existing generative models in perceptual quality, but it also adapts effectively to various musical tasks, as shown in human evaluations via Mechanical Turk.
Human-Like Musical Task Versatility: Emphasizing its adaptability, Coconet is capable of performing diverse generative tasks such as bridging musical fragments and temporal extrapolation. The model's versatility is a significant departure from prior systems, which often require substantial architectural or procedural changes to handle different tasks.

Implications and Future Directions

The employment of CNNs for counterpoint introduces a new paradigm for generating polyphonic music. The independence from sequential data processing and reliance on CNNs to address the locality and hierarchical nature of musical structures open novel pathways for broader and more nuanced applications in music information retrieval and AI-driven composition tools.

The results highlighted in this paper suggest that future research could focus on expanding these methods to encompass more complex musical scenarios requiring deeper architectural adjustments or real-time interactivity. Furthermore, investigating other probabilistic sampling techniques or hybrid models might yield additional insights into capturing and mimicking the intricacies of human composition processes.

In conclusion, "Counterpoint by Convolution" provides a substantial contribution to the field of algorithmic music composition, paving the way for subsequent inquiries into convolutional architectures and their potential to revolutionize music generation tasks. The implications of this paper extend beyond theoretical interest, offering practical advancements in how AI can assist composers as creative collaborators.

PDF Markdown

Related Papers

YouTube

Show All Videos