Improving Music Genre Classification from Multi-Modal Properties of Music and Genre Correlations Perspective (2303.07667v2)
Abstract: Music genre classification has been widely studied in past few years for its various applications in music information retrieval. Previous works tend to perform unsatisfactorily, since those methods only use audio content or jointly use audio content and lyrics content inefficiently. In addition, as genres normally co-occur in a music track, it is desirable to capture and model the genre correlations to improve the performance of multi-label music genre classification. To solve these issues, we present a novel multi-modal method leveraging audio-lyrics contrastive loss and two symmetric cross-modal attention, to align and fuse features from audio and lyrics. Furthermore, based on the nature of the multi-label classification, a genre correlations extraction module is presented to capture and model potential genre correlations. Extensive experiments demonstrate that our proposed method significantly surpasses other multi-label music genre classification methods and achieves state-of-the-art result on Music4All dataset.
- Michael I. Mandel and Daniel P. W. Ellis, “Song-level features and support vector machines for music classification,” in International Society for Music Information Retrieval Conference, 2005.
- “Aggregate features and adaboost for music classification,” Machine learning, vol. 65, no. 2, pp. 473–484, 2006.
- “Popular music representation: chorus detection & emotion recognition,” Multimedia tools and applications, vol. 73, no. 3, pp. 2103–2128, 2014.
- “Neural network music genre classification,” Canadian Journal of Electrical and Computer Engineering, vol. 43, no. 3, pp. 170–173, 2020.
- “Music artist classification with convolutional recurrent neural networks,” in 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019, pp. 1–8.
- “Multimodal deep learning for music genre classification,” Transactions of the International Society for Music Information Retrieval. 2018; 1 (1): 4-21., 2018.
- “A multimodal approach for multi-label movie genre classification,” Multimedia Tools and Applications, pp. 1–26, 2020.
- “Research on music emotion classification based on lyrics and audio,” in 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). IEEE, 2018, pp. 1154–1159.
- “Multimodal music emotion recognition with hierarchical cross-modal attention network,” in 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022, pp. 1–6.
- “Multi-objective investigation of six feature source types for multi-modal music classification,” Transactions of the International Society for Music Information Retrieval, vol. 5, no. 1, 2022.
- “Multi-modal, multi-task and multi-label for music genre classification and emotion regression,” in 2021 International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 2021, pp. 1042–1045.
- “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- “Semi-supervised classification with graph convolutional networks,” ArXiv, vol. abs/1609.02907, 2016.
- “Reasoning with heterogeneous graph alignment for video question answering,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, pp. 11109–11116.
- “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning. PMLR, 2021, pp. 8748–8763.
- “Align and prompt: Video-and-language pre-training with entity prompts,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4953–4963.
- “Music4all: A new music database and its applications,” in 2020 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 2020, pp. 399–404.
- Ganghui Ru (1 paper)
- Xulong Zhang (60 papers)
- Jianzong Wang (144 papers)
- Ning Cheng (96 papers)
- Jing Xiao (267 papers)