MARBLE: Music Audio Representation Benchmark for Universal Evaluation (2306.10548v4)
Abstract: In the era of extensive intersection between art and AI, such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical LLMs perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published at https://marble-bm.shef.ac.uk to promote future music AI research.
- Ruibin Yuan (43 papers)
- Yinghao Ma (24 papers)
- Yizhi Li (43 papers)
- Ge Zhang (170 papers)
- Xingran Chen (27 papers)
- Hanzhi Yin (6 papers)
- Le Zhuo (25 papers)
- Yiqi Liu (13 papers)
- Jiawen Huang (8 papers)
- Zeyue Tian (12 papers)
- Binyue Deng (1 paper)
- Ningzhi Wang (1 paper)
- Chenghua Lin (127 papers)
- Emmanouil Benetos (89 papers)
- Anton Ragni (22 papers)
- Norbert Gyenge (4 papers)
- Roger Dannenberg (8 papers)
- Wenhu Chen (134 papers)
- Gus Xia (57 papers)
- Wei Xue (150 papers)