Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks (2312.05994v2)

Published 10 Dec 2023 in cs.SD, cs.IR, and eess.AS

Abstract: Music Information Retrieval (MIR) research is increasingly leveraging representation learning to obtain more compact, powerful music audio representations for various downstream MIR tasks. However, current representation evaluation methods are fragmented due to discrepancies in audio and label preprocessing, downstream model and metric implementations, data availability, and computational resources, often leading to inconsistent and limited results. In this work, we introduce mir_ref, an MIR Representation Evaluation Framework focused on seamless, transparent, local-first experiment orchestration to support representation development. It features implementations of a variety of components such as MIR datasets, tasks, embedding models, and tools for result analysis and visualization, while facilitating the implementation of custom components. To demonstrate its utility, we use it to conduct an extensive evaluation of several embedding models across various tasks and datasets, including evaluating their robustness to various audio perturbations and the ease of extracting relevant information from them.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
  2. Tensorflow audio models in Essentia. In 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020.
  3. Pre-training strategies using contrastive learning and playlist information for music classification and similarity. In 48th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes, Greece, 2023.
  4. Music representation learning based on editorial metadata from discogs. In 23rd International Society for Music Information Retrieval (ISMIR) Conference, Bengaluru, India, 2022.
  5. mirdata: Software for reproducible usage of datasets. In 20th International Society for Music Information Retrieval (ISMIR) Conference, Delft, The Netherlands, 2019.
  6. Essentia: An audio analysis library for music information retrieval. In 14th International Society for Music Information Retrieval (ISMIR) Conference, Curitiba, Brazil, 2013.
  7. Codified audio language modeling learns useful representations for music information retrieval. In 22nd International Society for Music Information Retrieval (ISMIR) Conference, Online, 2021.
  8. OrchideaSOL: a dataset of extended instrumental techniques for computer-aided orchestration. ArXiv, abs/2007.00763, 2020.
  9. Neural audio fingerprint for high-specific audio retrieval based on contrastive learning. In 46th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021.
  10. Transfer learning for music classification and regression tasks. In 18th International Society for Music Information Retrieval (ISMIR) Conference, Suzhou, China, 2017.
  11. Look, listen, and learn more: Design choices for deep audio embeddings. In 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019.
  12. Audio-based music classification with a pretrained convolutional network. In 12th International Society for Music Information Retrieval (ISMIR) Conference, Miami, USA, 2011.
  13. COALA: Co-aligned autoencoders for learning semantically enriched audio representations. In Workshop on Self-supervision in Audio and Speech at the 37th International Conference on Machine Learning (ICML), Vienna, Austria, 2020.
  14. Learning contextual tag embeddings for cross-modal alignment of audio and tags. In 46th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021.
  15. Enriched music representations with multiple cross-modal contrastive learning. IEEE Signal Processing Letters, 28:733–737, 2021.
  16. mirdata. https://zenodo.org/doi/10.5281/zenodo.4355858.
  17. Learning features from music audio with deep belief networks. In 11th International Society for Music Information Retrieval (ISMIR) Conference, Utrecht, The Netherlands, 2010.
  18. MuLan: A joint embedding of music audio and natural language. In 23rd International Society for Music Information Retrieval (ISMIR) Conference, Bengaluru, India, 2022.
  19. iver56/audiomentations. https://zenodo.org/doi/10.5281/zenodo.6046288.
  20. NTTCSL Lab. EVAR: Evaluation Package for Audio Representations. https://github.com/nttcslab/eval-audio-repr.
  21. Unsupervised feature learning for audio classification using convolutional deep belief networks. In 22nd International Conference on Neural Information Processing Systems (NIPS), Vancouver, Canada, 2009.
  22. Representation learning of music using artist, album, and track information. In Workshop on Machine Learning for Music Discovery at the 36th International Conference on Machine Learning (ICML), Long Beach, USA, 2019.
  23. Content-aware collaborative music recommendation using pre-trained neural networks. In 16th International Society for Music Information Retrieval (ISMIR) Conference, Málaga, Spain, 2015.
  24. Contrastive audio-language learning for music. In 23rd International Society for Music Information Retrieval (ISMIR) Conference, Bengaluru, India, 2022.
  25. Learning music audio representations via weak language supervision. In 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022.
  26. Open-source practices for music signal processing research: Recommendations for transparent, sustainable, and reproducible audio research. IEEE Signal Processing Magazine, 36(1):128–137, 2019.
  27. Representation learning of music using artist labels. In 18th International Society for Music Information Retrieval (ISMIR) Conference, Suzhou, China, 2017.
  28. Musicnn: Pre-trained convolutional neural networks for music audio tagging. In Late-Breaking Demo at the 18th International Society for Music Information Retrieval (ISMIR) Conference, Suzhou, China, 2017.
  29. MIR_EVAL: A transparent implementation of common MIR metrics. In 15th International Society for Music Information Retrieval (ISMIR) Conference, Taipei, Taiwan, 2014.
  30. Contrastive learning of general-purpose audio representations. In 46th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021.
  31. Contrastive learning of musical representations. In 22nd International Society for Music Information Retrieval (ISMIR) Conference, Delfth, The Netherlands, 2021.
  32. Ten years of MIREX: Reflections, challenges and opportunities. In 15th International Society for Music Information Retrieval (ISMIR) Conference, Taipei, Taiwan, 2014.
  33. HEAR: Holistic evaluation of audio representations (NeurIPS 2021 Competition). In Joseph Turian, Björn W. Schuller, Dorien Herremans, Katrin Kirchoff, Paola Garcia Perera, and Philippe Esling, editors, Proceedings of Machine Learning Research, volume 166. PMLR, 2022.
  34. Transfer learning by supervised pre-training for audio-based music classification. In 15th International Society for Music Information Retrieval (ISMIR) Conference, Taipei, Taiwan, 2014.
  35. Towards learning universal audio representations. In 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022.
  36. VocalSet: A singing voice dataset. In International Society for Music Information Retrieval (ISMIR) Conference, Paris, France, 2018.
  37. Data-driven harmonic filters for audio representation learning. In 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020.
  38. Evaluation of CNN-based automatic music tagging models. In 17th Sound and Music Computing Conference, Torino, Italy, 2020.
  39. MARBLE: Music audio representation benchmark for universal evaluation. ArXiv, abs/2306.10548, 2023.
  40. MusiCoder: A universal music-acoustic encoder based on transformer. In 27th International Conference on MultiMedia Modeling (MMM), Prague, Czechia, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Christos Plachouras (5 papers)
  2. Pablo Alonso-Jiménez (6 papers)
  3. Dmitry Bogdanov (18 papers)
Citations (2)