Reconstructing Human Expressiveness in Piano Performances with a Transformer Network (2306.06040v2)
Abstract: Capturing intricate and subtle variations in human expressiveness in music performance using computational approaches is challenging. In this paper, we propose a novel approach for reconstructing human expressiveness in piano performance with a multi-layer bi-directional Transformer encoder. To address the needs for large amounts of accurately captured and score-aligned performance data in training neural networks, we use transcribed scores obtained from an existing transcription model to train our model. We integrate pianist identities to control the sampling process and explore the ability of our system to model variations in expressiveness for different pianists. The system is evaluated through statistical analysis of generated expressive performances and a listening test. Overall, the results suggest that our method achieves state-of-the-art in generating human-like piano performances from transcribed scores, while fully and consistently reconstructing human expressiveness poses further challenges.
- “Go Listen: An End-to-End Online Listening Test Platform” In Journal of Open Research Software, 2021 URL: http://doi.org/10.5334/jors.361
- “Computational models of expressive music performance: A comprehensive and critical review” In Frontiers in Digital Humanities 5 Frontiers Media SA, 2018, pp. 25
- “GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks” In Proceedings of the 35th International Conference on Machine Learning 80, Proceedings of Machine Learning Research PMLR, 2018, pp. 794–803 URL: https://proceedings.mlr.press/v80/chen18a.html
- “Encoding Musical Style with Transformer Autoencoders” In Proceedings of the 37th International Conference on Machine Learning 119, Proceedings of Machine Learning Research PMLR, 2020, pp. 1899–1908 URL: https://proceedings.mlr.press/v119/choi20b.html
- “MidiBERT-piano: Large-scale pre-training for symbolic music understanding” In arXiv preprint arXiv:2107.05223, 2021
- Shuqi Dai, Zheng Zhang and Gus G. Xia “Music Style Transfer: A Position Paper” In arXiv:1803.06841 [cs, eess], 2018 arXiv: http://arxiv.org/abs/1803.06841
- “ASAP: a dataset of aligned scores and performances for piano transcription” In Proceedings of the 21st International Society for Music Information Retrieval Conference, 2020, pp. 534–541
- Werner Goebl “Melody lead in piano performance: Expressive device or artifact?” In The Journal of the Acoustical Society of America 110.1 Acoustical Society of America, 2001, pp. 563–572
- “Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs” In Proceedings of the AAAI Conference on Artificial Intelligence 35.1, 2021, pp. 178–186
- “Music Transformer: Generating Music with Long-Term Structure” In International Conference on Learning Representations, 2018
- “Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions” In Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1180–1188
- “VirtuosoNet: A Hierarchical RNN-based System for Modeling Expressive Piano Performance” In Proceedings of the 20th International Society for Music Information Retrieval Conference, 2019
- “Graph neural network for music score data and modeling expressive piano performance” In International Conference on Machine Learning, 2019, pp. 3060–3070 PMLR
- “High-resolution piano transcription with pedals by regressing onset and offset times” In IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 IEEE, 2021, pp. 3707–3717
- “Performance MIDI-to-score conversion by neural beat tracking” In Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022
- Justin London “Hearing in time: Psychological aspects of musical meter” Oxford University Press, 2012
- “SGDR: Stochastic Gradient Descent with Warm Restarts” In International Conference on Learning Representations, 2017 URL: https://openreview.net/forum?id=Skq89Scxx
- Eita Nakamura, Kazuyoshi Yoshii and Haruhiro Katayose “Performance Error Detection and Post-Processing for Fast and Accurate Symbolic Music Alignment” In Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017
- S. R. M. Rafee, G. Fazekas and G. A. Wiggins “HIPI: A Hierarchical Performer Identification Model Based on Symbolic Representation of Music” In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2023
- S. R. M. Rafee, G. Fazekas and G. A. Wiggins “Performer identification from symbolic representation of music using statistical models” ArXiv preprint In Proceedings of the International Computer Music Conference 2021, 2021, pp. 178–184
- Seungyeon Rhyu, Sarah Kim and Kyogu Lee “Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-Supervised Learning” In International Society for Music Information Retrieval Conference, 2022, pp. 178–185 URL: https://doi.org/10.5281/zenodo.7342916
- B Series “Method for the subjective assessment of intermediate quality level of audio systems” In International Telecommunication Union Radiocommunication Assembly, 2014
- “A framework for the evaluation of music representation systems” In Computer Music Journal 17.3, 1993, pp. 31–42 URL: http://www.soi.city.ac.uk/~geraint/papers/CMJ93.pdf
- “MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training” In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 791–800 URL: https://aclanthology.org/2021.findings-acl.70
- “ATEPP: A Dataset of Automatically Transcribed Expressive Piano Performance” In International Society for Music Information Retrieval Conference, 2022, pp. 446–453 URL: https://doi.org/10.5281/zenodo.7342764
- “Violinist identification based on vibrato features” In 2021 29th European Signal Processing Conference (EUSIPCO), 2021, pp. 381–385 IEEE