Papers
Topics
Authors
Recent
Search
2000 character limit reached

Music transcription modelling and composition using deep learning

Published 29 Apr 2016 in cs.SD and cs.LG | (1604.08723v1)

Abstract: We apply deep learning methods, specifically long short-term memory (LSTM) networks, to music transcription modelling and composition. We build and train LSTM networks using approximately 23,000 music transcriptions expressed with a high-level vocabulary (ABC notation), and use them to generate new transcriptions. Our practical aim is to create music transcription models useful in particular contexts of music composition. We present results from three perspectives: 1) at the population level, comparing descriptive statistics of the set of training transcriptions and generated transcriptions; 2) at the individual level, examining how a generated transcription reflects the conventions of a music practice in the training transcriptions (Celtic folk); 3) at the application level, using the system for idea generation in music composition. We make our datasets, software and sound examples open and available: \url{https://github.com/IraKorshunova/folk-rnn}.

Citations (167)

Summary

  • The paper introduces two LSTM models—char-rnn and folk-rnn—for music transcription and composition using ABC notation.
  • It employs deep architectures with dropout regularization and RMSprop optimization to generate compositions that mirror training data patterns.
  • The models produce structured, genre-specific transcriptions that serve as innovative tools to augment traditional music composition.

Deep Learning for Music Transcription Modelling and Composition

The paper "Music transcription modelling and composition using deep learning" by Sturm et al. investigates the application of Long Short-Term Memory (LSTM) networks in music transcription and composition tasks. The research focuses on leveraging approximately 23,000 music transcriptions in ABC notation to train LSTM models that can generate new music. The main objectives are to create music transcription models useful for specific compositional contexts and to facilitate music composition within established musical practices.

Methodology

The researchers develop two distinct LSTM-based models: a character-based model ({\em char-rnn}) and a token-based model ({\em folk-rnn}).

  • Char-rnn Model: This model operates on individual characters, learning a probabilistic LLM from continuous text.
  • Folk-rnn Model: This operates on tokens representing music-specific elements like pitch, meter, key, and others, which are extracted from ABC notations.

Both models are constructed as deep networks with three hidden layers, each containing 512 LSTM blocks. The implementation leverages advanced techniques such as dropout regularization and RMSprop optimization to enhance convergence and efficiency.

Results

The study presents results from three perspectives:

  1. Population-Level Analysis: Descriptive statistics reveal that the generated transcriptions closely follow patterns found in the training data. The distribution of transcription lengths and note-ending pitches correspond closely to those in the dataset.
  2. Individual-Level Analysis: The generated pieces possess structure and stylistic elements found in Celtic folk music, reflecting the ability of the models to internalize and recreate genre-specific conventions.
  3. Application-Level Analysis: The LSTM models are used as tools for generating novel ideas in composition. Authors demonstrate how generated sequences can be curated and refined into coherent musical pieces, extending beyond traditional styles learned during training.

Implications

The application of LSTM networks exemplifies the potential of deep learning in generating plausible music sequences that align with specific musical styles. The models produce outputs that, with curation, could be integrated into practice by musicians, particularly in genres with strong traditions of improvisation and variation, like Celtic folk music.

Practically, the research opens avenues for AI-assisted music composition, enabling composers to augment their creativity through interaction with generative models. Theoretically, it underscores the potential for neural networks to capture and reproduce complex time-based structures within musical data.

Future Directions

Further research could explore extending these models to other musical genres or integrating broader context-aware features to enhance the creative output's complexity and sophistication. Additionally, engaging with practitioners to refine the applications and improve model relevance in real-world settings will likely yield productive feedback. Another area for exploration is understanding the interpretability of neural models in music, potentially providing deeper insight into the structures and patterns they learn.

In conclusion, this paper presents a comprehensive approach to employing LSTM networks for music transcription and composition, demonstrating the adaptability and potential of deep learning in expanding the boundaries of traditional music generation.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.