Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

End-to-end Piano Performance-MIDI to Score Conversion with Transformers (2410.00210v1)

Published 30 Sep 2024 in cs.SD, cs.LG, and eess.AS

Abstract: The automated creation of accurate musical notation from an expressive human performance is a fundamental task in computational musicology. To this end, we present an end-to-end deep learning approach that constructs detailed musical scores directly from real-world piano performance-MIDI files. We introduce a modern transformer-based architecture with a novel tokenized representation for symbolic music data. Framing the task as sequence-to-sequence translation rather than note-wise classification reduces alignment requirements and annotation costs, while allowing the prediction of more concise and accurate notation. To serialize symbolic music data, we design a custom tokenization stage based on compound tokens that carefully quantizes continuous values. This technique preserves more score information while reducing sequence lengths by $3.5\times$ compared to prior approaches. Using the transformer backbone, our method demonstrates better understanding of note values, rhythmic structure, and details such as staff assignment. When evaluated end-to-end using transcription metrics such as MUSTER, we achieve significant improvements over previous deep learning approaches and complex HMM-based state-of-the-art pipelines. Our method is also the first to directly predict notational details like trill marks or stem direction from performance data. Code and models are available at https://github.com/TimFelixBeyer/MIDI2ScoreTransformer

Summary

We haven't generated a summary for this paper yet.