Music transcription modelling and composition using deep learning

Published 29 Apr 2016 in cs.SD and cs.LG | (1604.08723v1)

Abstract: We apply deep learning methods, specifically long short-term memory (LSTM) networks, to music transcription modelling and composition. We build and train LSTM networks using approximately 23,000 music transcriptions expressed with a high-level vocabulary (ABC notation), and use them to generate new transcriptions. Our practical aim is to create music transcription models useful in particular contexts of music composition. We present results from three perspectives: 1) at the population level, comparing descriptive statistics of the set of training transcriptions and generated transcriptions; 2) at the individual level, examining how a generated transcription reflects the conventions of a music practice in the training transcriptions (Celtic folk); 3) at the application level, using the system for idea generation in music composition. We make our datasets, software and sound examples open and available: \url{https://github.com/IraKorshunova/folk-rnn}.

Abstract PDF Upgrade to Chat

Citations (167)

View on Semantic Scholar

Summary

The paper introduces two LSTM models—char-rnn and folk-rnn—for music transcription and composition using ABC notation.
It employs deep architectures with dropout regularization and RMSprop optimization to generate compositions that mirror training data patterns.
The models produce structured, genre-specific transcriptions that serve as innovative tools to augment traditional music composition.

Deep Learning for Music Transcription Modelling and Composition

The paper "Music transcription modelling and composition using deep learning" by Sturm et al. investigates the application of Long Short-Term Memory (LSTM) networks in music transcription and composition tasks. The research focuses on leveraging approximately 23,000 music transcriptions in ABC notation to train LSTM models that can generate new music. The main objectives are to create music transcription models useful for specific compositional contexts and to facilitate music composition within established musical practices.

Methodology

The researchers develop two distinct LSTM-based models: a character-based model ({\em char-rnn}) and a token-based model ({\em folk-rnn}).

Char-rnn Model: This model operates on individual characters, learning a probabilistic LLM from continuous text.
Folk-rnn Model: This operates on tokens representing music-specific elements like pitch, meter, key, and others, which are extracted from ABC notations.

Both models are constructed as deep networks with three hidden layers, each containing 512 LSTM blocks. The implementation leverages advanced techniques such as dropout regularization and RMSprop optimization to enhance convergence and efficiency.

Results

The study presents results from three perspectives:

Population-Level Analysis: Descriptive statistics reveal that the generated transcriptions closely follow patterns found in the training data. The distribution of transcription lengths and note-ending pitches correspond closely to those in the dataset.
Individual-Level Analysis: The generated pieces possess structure and stylistic elements found in Celtic folk music, reflecting the ability of the models to internalize and recreate genre-specific conventions.
Application-Level Analysis: The LSTM models are used as tools for generating novel ideas in composition. Authors demonstrate how generated sequences can be curated and refined into coherent musical pieces, extending beyond traditional styles learned during training.

Implications

The application of LSTM networks exemplifies the potential of deep learning in generating plausible music sequences that align with specific musical styles. The models produce outputs that, with curation, could be integrated into practice by musicians, particularly in genres with strong traditions of improvisation and variation, like Celtic folk music.

Practically, the research opens avenues for AI-assisted music composition, enabling composers to augment their creativity through interaction with generative models. Theoretically, it underscores the potential for neural networks to capture and reproduce complex time-based structures within musical data.

Future Directions

Further research could explore extending these models to other musical genres or integrating broader context-aware features to enhance the creative output's complexity and sophistication. Additionally, engaging with practitioners to refine the applications and improve model relevance in real-world settings will likely yield productive feedback. Another area for exploration is understanding the interpretability of neural models in music, potentially providing deeper insight into the structures and patterns they learn.

In conclusion, this paper presents a comprehensive approach to employing LSTM networks for music transcription and composition, demonstrating the adaptability and potential of deep learning in expanding the boundaries of traditional music generation.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (4)

Collections

GitHub

GitHub - IraKorshunova/folk-rnn: folk music modelling with LSTM (340 stars)

YouTube

Show All Videos

Music transcription modelling and composition using deep learning

Summary

Deep Learning for Music Transcription Modelling and Composition

Methodology

Results

Implications

Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections

GitHub

YouTube