The NES Music Database: A multi-instrumental dataset with expressive performance attributes (1806.04278v1)

Published 12 Jun 2018 in cs.SD, cs.LG, cs.NE, and eess.AS

Abstract: Existing research on music generation focuses on composition, but often ignores the expressive performance characteristics required for plausible renditions of resultant pieces. In this paper, we introduce the Nintendo Entertainment System Music Database (NES-MDB), a large corpus allowing for separate examination of the tasks of composition and performance. NES-MDB contains thousands of multi-instrumental songs composed for playback by the compositionally-constrained NES audio synthesizer. For each song, the dataset contains a musical score for four instrument voices as well as expressive attributes for the dynamics and timbre of each voice. Unlike datasets comprised of General MIDI files, NES-MDB includes all of the information needed to render exact acoustic performances of the original compositions. Alongside the dataset, we provide a tool that renders generated compositions as NES-style audio by emulating the device's audio processor. Additionally, we establish baselines for the tasks of composition, which consists of learning the semantics of composing for the NES synthesizer, and performance, which involves finding a mapping between a composition and realistic expressive attributes.

Citations (18)

View on Semantic Scholar

Summary

The paper introduces NES-MDB, a dataset capturing thousands of NES audio scores with detailed expressive performance attributes.
It details a methodology for separately modeling composition and performance using multi-instrument scores and autoregressive LSTM formulations.
The work advances AI music generation by bridging the gap between composition and expressive dynamics for more human-like renditions.

The NES Music Database: A Multi-Instrumental Dataset for Expressive Performance

The proliferation of interest in automated music generation presents complexities that are not solely addressed by focusing on composition alone; expressive performance characteristic is equally vital. The paper introduces the Nintendo Entertainment System Music Database (NES-MDB) as a robust corpus designed to separately investigate both composition and performance.

Dataset Composition and Methodology

NES-MDB is a comprehensive dataset encapsulating thousands of multi-instrumental songs intended for Nintendo's NES audio synthesizer. It distinguishes itself from other music datasets by preserving the exact performance attributes, such as dynamics and timbre, which are necessary for rendering precise acoustic reproductions. This is in stark contrast to collections built on General MIDI files, which often lack these expressive features.

A critical component of NES-MDB is its facilitation of separated examination of composition and performance. The dataset provides musical scores for four distinct instrument voices with detailed expressive attributes. Furthermore, a rendering tool emulates NES-style audio, enhancing authenticity in performance emulation.

Evaluation and Baseline Establishments

The authors address two core tasks with the NES-MDB: separated composition and expressive performance modeling. The first task involves learning composition semantics, focusing on separated scores rather than blended polyphonic representations. This approach circumvents the limitations of traditional models that may struggle with multiple instrument voices. The second task models the mapping of compositions to expressive characteristics. Baseline models showcase varying efficiencies; notably, the LSTM Quartet and DeepBach models show promise in composition tasks.

In expressive performance modeling, the paper explores autoregressive formulations through LSTM architectures, leveraging existing musical context for its predictions. Experiments reveal insights into how these models grasp expressive dynamics, marking variances in accuracy at points of interest (POIs).

Contributions and Implications

The NES-MDB serves as a pivotal foundation for advancing the integration of expressive performance in AI music generation paradigms. By capturing both compositional and performative elements, it not only supports improved generative models but also facilitates tools that allow researchers to convert between machine code and more interpretable formats, like MIDI.

This dataset, due to its scope and representational completeness, bridges a significant gap in existing music datasets, thereby advocating for a comprehensive understanding of not just how music is composed, but how it is performed.

Future Directions

For theoretical and practical advancements, NES-MDB sets the stage for novel methodologies in AI-driven music generation. Future research could explore refining models' understanding of complex interdependencies between composition and expressive performance. Such work could lead to AI systems capable of producing more human-like and expressively dynamic music renditions, thus furthering the field of automated music generation.

In conclusion, NES-MDB offers a unique opportunity to holistically approach music generation, incorporating the critical expressive elements often overlooked in traditional datasets and methodologies.

PDF Markdown

Related Papers

GitHub

GitHub - chrisdonahue/nesmdb: The NES Music Database: use machine learning to compose music for the Nintendo Entertainment System! (476 stars)