Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Isolated Musical Notes as Pre-training Data for Predominant Instrument Recognition in Polyphonic Music (2306.08850v1)

Published 15 Jun 2023 in cs.SD and eess.AS

Abstract: With the growing amount of musical data available, automatic instrument recognition, one of the essential problems in Music Information Retrieval (MIR), is drawing more and more attention. While automatic recognition of single instruments has been well-studied, it remains challenging for polyphonic, multi-instrument musical recordings. This work presents our efforts toward building a robust end-to-end instrument recognition system for polyphonic multi-instrument music. We train our model using a pre-training and fine-tuning approach: we use a large amount of monophonic musical data for pre-training and subsequently fine-tune the model for the polyphonic ensemble. In pre-training, we apply data augmentation techniques to alleviate the domain gap between monophonic musical data and real-world music. We evaluate our method on the IRMAS testing data, a polyphonic musical dataset comprising professionally-produced commercial music recordings. Experimental results show that our best model achieves a micro F1-score of 0.674 and an LRAP of 0.814, meaning 10.9% and 8.9% relative improvement compared with the previous state-of-the-art end-to-end approach. Also, we are able to build a lightweight model, achieving competitive performance with only 519K trainable parameters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. M. Schedl, E. Gómez, J. Urbano et al., “Music information retrieval: Recent developments and applications,” Foundations and Trends® in Information Retrieval, vol. 8, no. 2-3, pp. 127–261, 2014.
  2. Y.-N. Hung and A. Lerch, “Multitask learning for instrument activation aware music source separation,” in ISMIR, 2020, pp. 748–755.
  3. J. J. Bosch, J. Janer, F. Fuhrmann, and P. Herrera, “A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals.” in ISMIR, 2012, pp. 559–564.
  4. A. Kratimenos, K. Avramidis, C. Garoufis, A. Zlatintsi, and P. Maragos, “Augmentation methods on monophonic audio for instrument classification in polyphonic music,” in 2020 28th European Signal Processing Conference (EUSIPCO).   IEEE, 2021, pp. 156–160.
  5. X. Shi, E. Cooper, and J. Yamagishi, “Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 367–377, 2022.
  6. A. Boulch, “Reducing parameter number in residual networks by sharing weights,” Pattern Recognition Letters, vol. 103, pp. 53–59, 2018.
  7. I. Kaminsky and A. Materka, “Automatic source identification of monophonic musical instrument sounds,” in Proceedings of ICNN’95-International Conference on Neural Networks, vol. 1.   IEEE, 1995, pp. 189–194.
  8. J. Eggink and G. J. Brown, “A missing feature approach to instrument identification in polyphonic music,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., vol. 5.   IEEE, 2003, pp. V–553.
  9. J. Eggink and G. J. Brown, “Instrument recognition in accompanied sonatas and concertos,” in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4.   IEEE, 2004, pp. iv–iv.
  10. A. Livshin and X. Rodet, “Musical instrument identification in continuous recordings,” in Digital Audio Effects 2004, 2004, pp. 1–1.
  11. V. Lostanlen, J. Andén, and M. Lagrange, “Extended playing techniques: the next milestone in musical instrument recognition,” in Proceedings of the 5th International Conference on Digital Libraries for Musicology, 2018, pp. 1–10.
  12. F. Fuhrmann, M. Haro, and P. Herrera, “Scalability, generality and temporal aspects in automatic recognition of predominant musical instruments in polyphonic music.” in ISMIR, 2009, pp. 321–326.
  13. F. Fuhrmann et al., “Automatic musical instrument recognition from polyphonic music audio signals,” Ph.D. dissertation, Universitat Pompeu Fabra, 2012.
  14. Y. Han, J. Kim, and K. Lee, “Deep convolutional neural networks for predominant instrument recognition in polyphonic music,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 1, pp. 208–221, 2016.
  15. J. Pons, O. Slizovskaia, R. Gong, E. Gómez, and X. Serra, “Timbre analysis of music audio signals with convolutional neural networks,” in 2017 25th European Signal Processing Conference (EUSIPCO).   IEEE, 2017, pp. 2744–2748.
  16. D. Yu, H. Duan, J. Fang, and B. Zeng, “Predominant instrument recognition based on deep neural network with auxiliary classification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 852–861, 2020.
  17. L. C. Reghunath and R. Rajan, “Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2022, no. 1, pp. 1–14, 2022.
  18. K. Avramidis, A. Kratimenos, C. Garoufis, A. Zlatintsi, and P. Maragos, “Deep convolutional and recurrent networks for polyphonic instrument classification from monophonic raw audio waveforms,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2021, pp. 3010–3014.
  19. L. Zhong, N. Minematsu, and D. Saito, “Predominant instrument recognition in polyphonic music based on transfer learning with vanilla ResNet-50,” in IEICE Technical Report. vol. 122, no. 387, EA2022-114, pp. 232-237, 2023.
  20. M. Ravanelli and Y. Bengio, “Speaker recognition from raw waveform with sincnet,” in 2018 IEEE Spoken Language Technology Workshop (SLT).   IEEE, 2018, pp. 1021–1028.
  21. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  22. W. Cai, Z. Cai, X. Zhang, X. Wang, and M. Li, “A novel learnable dictionary encoding layer for end-to-end language identification,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2018, pp. 5189–5193.
  23. Y. Tokozume, Y. Ushiku, and T. Harada, “Learning from between-class examples for deep sound recognition,” in International Conference on Learning Representations, 2018.
  24. H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations, 2018.
  25. A. Ramires and X. Serra, “Data augmentation for instrument classification robust to audio effects,” in Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), 2019.
  26. J. Engel, C. Resnick, A. Roberts, S. Dieleman, M. Norouzi, D. Eck, and K. Simonyan, “Neural audio synthesis of musical notes with wavenet autoencoders,” in International Conference on Machine Learning.   PMLR, 2017, pp. 1068–1077.
  27. H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal, “Context encoding for semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  28. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
  29. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR (Poster), 2015.
  30. R. E. Schapire and Y. Singer, “Boostexter: A boosting-based system for text categorization,” Machine learning, vol. 39, no. 2, pp. 135–168, 2000.
  31. S. Gururani, C. Summers, and A. Lerch, “Instrument activity detection in polyphonic music using deep neural networks.” in ISMIR, 2018, pp. 569–576.
  32. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.

Summary

We haven't generated a summary for this paper yet.