MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing (2407.02277v2)
Abstract: In the domain of symbolic music research, the progress of developing scalable systems has been notably hindered by the scarcity of available training data and the demand for models tailored to specific tasks. To address these issues, we propose MelodyT5, a novel unified framework that leverages an encoder-decoder architecture tailored for symbolic music processing in ABC notation. This framework challenges the conventional task-specific approach, considering various symbolic music tasks as score-to-score transformations. Consequently, it integrates seven melody-centric tasks, from generation to harmonization and segmentation, within a single model. Pre-trained on MelodyHub, a newly curated collection featuring over 261K unique melodies encoded in ABC notation and encompassing more than one million task instances, MelodyT5 demonstrates superior performance in symbolic music processing via multi-task transfer learning. Our findings highlight the efficacy of multi-task transfer learning in symbolic music processing, particularly for data-scarce tasks, challenging the prevailing task-specific paradigms and offering a comprehensive dataset and framework for future explorations in this domain.
- J. Liu, Y. Dong, Z. Cheng, X. Zhang, X. Li, F. Yu, and M. Sun, “Symphony generation with permutation invariant language model,” in Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022, Bengaluru, India, December 4-8, 2022, 2022, pp. 551–558. [Online]. Available: https://archives.ismir.net/ismir2022/paper/000066.pdf
- P. Lu, X. Tan, B. Yu, T. Qin, S. Zhao, and T. Liu, “Meloform: Generating melody with musical form based on expert systems and neural networks,” in Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022, Bengaluru, India, December 4-8, 2022, 2022, pp. 567–574. [Online]. Available: https://archives.ismir.net/ismir2022/paper/000068.pdf
- L. Min, J. Jiang, G. Xia, and J. Zhao, “Polyffusion: A diffusion model for polyphonic score generation with internal and external controls,” in Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023, Milan, Italy, November 5-9, 2023, 2023, pp. 231–238. [Online]. Available: https://doi.org/10.5281/zenodo.10265265
- S. Wu and M. Sun, “Exploring the efficacy of pre-trained checkpoints in text-to-music generation task,” in The AAAI-23 Workshop on Creative AI Across Modalities, 2023. [Online]. Available: https://openreview.net/forum?id=QmWXskBhesn
- M. Zeng, X. Tan, R. Wang, Z. Ju, T. Qin, and T. Liu, “Musicbert: Symbolic music understanding with large-scale pre-training,” in Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, ser. Findings of ACL, vol. ACL/IJCNLP 2021. Association for Computational Linguistics, 2021, pp. 791–800. [Online]. Available: https://doi.org/10.18653/v1/2021.findings-acl.70
- Z. Wang and G. Xia, “Musebert: Pre-training music representation for music understanding and controllable generation,” in Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR 2021, Online, November 7-12, 2021, 2021, pp. 722–729. [Online]. Available: https://archives.ismir.net/ismir2021/paper/000090.pdf
- S. Wu, D. Yu, X. Tan, and M. Sun, “Clamp: Contrastive language-music pre-training for cross-modal symbolic music information retrieval,” in Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023, Milan, Italy, November 5-9, 2023, 2023, pp. 157–165. [Online]. Available: https://doi.org/10.5281/zenodo.10265247
- Y. Zhang and G. Xia, “Symbolic melody phrase segmentation using neural network with conditional random field,” in Proceedings of the 8th Conference on Sound and Music Technology: Selected Papers from CSMT. Springer, 2021, pp. 55–65.
- K. Choi, J. Park, W. Heo, S. Jeon, and J. Park, “Chord conditioned melody generation with transformer based decoders,” IEEE Access, vol. 9, pp. 42 071–42 080, 2021. [Online]. Available: https://doi.org/10.1109/ACCESS.2021.3065831
- S. Wu, X. Li, and M. Sun, “Chord-conditioned melody harmonization with controllable harmonicity,” in IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023. IEEE, 2023, pp. 1–5. [Online]. Available: https://doi.org/10.1109/ICASSP49357.2023.10096398
- S. Wu, Y. Yang, Z. Wang, X. Li, and M. Sun, “Generating chord progression from melody with flexible harmonic rhythm and controllable harmonic density,” EURASIP J. Audio Speech Music. Process., vol. 2024, no. 1, p. 4, 2024. [Online]. Available: https://doi.org/10.1186/s13636-023-00314-6
- Z. Wang, K. Chen, J. Jiang, Y. Zhang, M. Xu, S. Dai, and G. Xia, “POP909: A pop-song dataset for music arrangement generation,” in Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, Montreal, Canada, October 11-16, 2020, 2020, pp. 38–45. [Online]. Available: http://archives.ismir.net/ismir2020/paper/000089.pdf
- Y. Hsiao, T. Hung, T. Chen, and L. Su, “Bps-motif: A dataset for repeated pattern discovery of polyphonic symbolic music,” in Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023, Milan, Italy, November 5-9, 2023, 2023, pp. 281–288. [Online]. Available: https://doi.org/10.5281/zenodo.10265277
- Y. Zhang, Z. Zhou, X. Li, F. Yu, and M. Sun, “Ccom-huqin: An annotated multimodal chinese fiddle performance dataset,” Trans. Int. Soc. Music. Inf. Retr., vol. 6, no. 1, pp. 60–74, 2023. [Online]. Available: https://doi.org/10.5334/tismir.146
- A. Holzapfel, B. L. Sturm, and M. Coeckelbergh, “Ethical dimensions of music information retrieval technology,” Trans. Int. Soc. Music. Inf. Retr., vol. 1, no. 1, pp. 44–55, 2018. [Online]. Available: https://doi.org/10.5334/tismir.13
- J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers. Association for Computational Linguistics, 2018, pp. 328–339. [Online]. Available: https://aclanthology.org/P18-1031/
- S. Ruder, M. E. Peters, S. Swayamdipta, and T. Wolf, “Transfer learning in natural language processing,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2, 2019, Tutorial Abstracts. Association for Computational Linguistics, 2019, pp. 15–18. [Online]. Available: https://doi.org/10.18653/v1/n19-5004
- M. Artetxe and H. Schwenk, “Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond,” Trans. Assoc. Comput. Linguistics, vol. 7, pp. 597–610, 2019. [Online]. Available: https://doi.org/10.1162/tacl\_a\_00288
- X. Liu, P. He, W. Chen, and J. Gao, “Multi-task deep neural networks for natural language understanding,” in Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, 2019, pp. 4487–4496. [Online]. Available: https://doi.org/10.18653/v1/p19-1441
- K. Song, X. Tan, T. Qin, J. Lu, and T. Liu, “MASS: masked sequence to sequence pre-training for language generation,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser. Proceedings of Machine Learning Research, vol. 97. PMLR, 2019, pp. 5926–5936. [Online]. Available: http://proceedings.mlr.press/v97/song19d.html
- Z. Zhang, W. Yu, M. Yu, Z. Guo, and M. Jiang, “A survey of multi-task learning in natural language processing: Regarding task relatedness and training methods,” in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023. Association for Computational Linguistics, 2023, pp. 943–956. [Online]. Available: https://doi.org/10.18653/v1/2023.eacl-main.66
- A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training,” 2018.
- J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 2019, pp. 4171–4186. [Online]. Available: https://doi.org/10.18653/v1/n19-1423
- C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21, pp. 140:1–140:67, 2020. [Online]. Available: http://jmlr.org/papers/v21/20-074.html
- S. Wu, X. Li, F. Yu, and M. Sun, “Tunesformer: Forming irish tunes with control codes by bar patching,” in Proceedings of the 2nd Workshop on Human-Centric Music Information Retrieval 2023 co-located with the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy, November 10, 2023, ser. CEUR Workshop Proceedings, vol. 3528. CEUR-WS.org, 2023. [Online]. Available: https://ceur-ws.org/Vol-3528/paper1.pdf
- L. Casini, N. Jonason, and B. L. Sturm, “Generating folk-like music in abc-notation with masked language models,” in Ismir 2023 Hybrid Conference, 2023.
- R. Yuan, H. Lin, Y. Wang, Z. Tian, S. Wu, T. Shen, G. Zhang, Y. Wu, C. Liu, Z. Zhou, Z. Ma, L. Xue, Z. Wang, Q. Liu, T. Zheng, Y. Li, Y. Ma, Y. Liang, X. Chi, R. Liu, Z. Wang, P. Li, J. Wu, C. Lin, Q. Liu, T. Jiang, W. Huang, W. Chen, E. Benetos, J. Fu, G. Xia, R. B. Dannenberg, W. Xue, S. Kang, and Y. Guo, “Chatmusician: Understanding and generating music intrinsically with LLM,” CoRR, vol. abs/2402.16153, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2402.16153
- B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Korshunova, “Music transcription modelling and composition using deep learning,” CoRR, vol. abs/1604.08723, 2016. [Online]. Available: http://arxiv.org/abs/1604.08723
- C. Geerlings and A. Meroño-Peñuela, “Interacting with gpt-2 to generate controlled and believable musical sequences in abc notation,” in NLP4MUSA, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:227217204
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 5998–6008. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
- “Abc notation,” http://abcnotation.com/, accessed: 2024-04-12.
- “Folkwiki,” http://www.folkwiki.se/, accessed: 2024-04-12.
- “Chord-conditioned melody harmonization with controllable harmonicity [icassp 2023],” https://github.com/sander-wood/deepchoir, accessed: 2024-04-12.
- “Kernscores,” http://kern.ccarh.org/, accessed: 2024-04-12.
- “The meertens tune collections,” https://www.liederenbank.nl/mtc/, accessed: 2024-04-12.
- “The nottingham music database,” https://ifdo.ca/~seymour/nottingham/nottingham.html, accessed: 2024-04-12.
- “Openscore lieder corpus,” https://musescore.com/openscore-lieder-corpus, accessed: 2024-04-12.
- “The session,” https://thesession.org/, accessed: 2024-04-12.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. [Online]. Available: https://openreview.net/forum?id=Bkg6RiCqY7
- “wikimusictext Dataset on Hugging Face Datasets,” https://huggingface.co/datasets/sander-wood/wikimusictext, accessed: 2024-04-01.
- “Download for Wikifonia all 6,675 Lead Sheets - Synth Zone Forum,” http://www.synthzone.com/forum/ubbthreads.php/topics/384909/Download_for_Wikifonia_all_6,6, accessed: 2024-04-01.
- “chord-melody-dataset on GitHub,” https://github.com/shiehn/chord-melody-dataset, accessed: 2024-04-01.
- “OpenEWLD on GitHub,” https://github.com/00sapo/OpenEWLD, accessed: 2024-04-01.
- “KernScores: Essen Folksong Collection,” http://kern.ccarh.org/cgi-bin/ksbrowse?l=/essen, accessed: 2024-04-01.
- “KernScores: Erk’s Liederschatz,” https://kern.humdrum.org/cgi-bin/browse?l=users/craig/songs/erk/liederschatz, accessed: 2024-04-01.
- S. Rhyu, H. Choi, S. Kim, and K. Lee, “Translating melody to chord: Structured and flexible harmonization of melody with transformer,” IEEE Access, vol. 10, pp. 28 261–28 273, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3155467
- Y. Yeh, W. Hsiao, S. Fukayama, T. Kitahara, B. Genchel, H. Liu, H. Dong, Y. Chen, T. Leong, and Y. Yang, “Automatic melody harmonization with triad chords: A comparative study,” CoRR, vol. abs/2001.02360, 2020. [Online]. Available: http://arxiv.org/abs/2001.02360