Music Auto-Tagging with Robust Music Representation Learned via Domain Adversarial Training (2401.15323v1)
Abstract: Music auto-tagging is crucial for enhancing music discovery and recommendation. Existing models in Music Information Retrieval (MIR) struggle with real-world noise such as environmental and speech sounds in multimedia content. This study proposes a method inspired by speech-related tasks to enhance music auto-tagging performance in noisy settings. The approach integrates Domain Adversarial Training (DAT) into the music domain, enabling robust music representations that withstand noise. Unlike previous research, this approach involves an additional pretraining phase for the domain classifier, to avoid performance degradation in the subsequent phase. Adding various synthesized noisy music data improves the model's generalization across different noise levels. The proposed architecture demonstrates enhanced performance in music auto-tagging by effectively utilizing unlabeled noisy music data. Additional experiments with supplementary unlabeled data further improves the model's performance, underscoring its robust generalization capabilities and broad applicability.
- “Towards a new interface for music listening: A user experience study on youtube,” arXiv preprint arXiv:2307.14718, 2023.
- “Domain-adversarial training of neural networks,” The journal of machine learning research, vol. 17, no. 1, pp. 2096–2030, 2016.
- “Improving distortion robustness of self-supervised speech processing tasks with domain adaptation,” arXiv preprint arXiv:2203.16104, 2022.
- “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
- “Convolutional recurrent neural networks for music classification,” in 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017, pp. 2392–2396.
- “Automatic tagging using deep convolutional neural networks,” arXiv preprint arXiv:1606.00298, 2016.
- “Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms,” arXiv preprint arXiv:1703.01789, 2017.
- “End-to-end learning for music audio tagging at scale,” arXiv preprint arXiv:1711.02520, 2017.
- “Sample-level cnn architectures for music auto-tagging using raw waveforms,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 366–370.
- “Toward interpretable music tagging with self-attention,” arXiv preprint arXiv:1906.04972, 2019.
- “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- “Semi-supervised music tagging transformer,” arXiv preprint arXiv:2111.13457, 2021.
- “Supervised and unsupervised learning of audio representations for music understanding,” arXiv preprint arXiv:2210.03799, 2022.
- “Mert: Acoustic music understanding model with large-scale self-supervised training,” arXiv preprint arXiv:2306.00107, 2023.
- “Marble: Music audio representation benchmark for universal evaluation,” arXiv preprint arXiv:2306.10548, 2023.
- “Domain-adversarial neural networks,” arXiv preprint arXiv:1412.4446, 2014.
- “Contrastive learning of musical representations,” arXiv preprint arXiv:2103.09410, 2021.
- “Samplecnn: End-to-end deep convolutional neural networks using very small filters for music classification,” Applied Sciences, vol. 8, no. 1, pp. 150, 2018.
- “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- “Evaluation of algorithms using games: The case of music tagging.,” in ISMIR. Citeseer, 2009, pp. 387–392.
- “Musan: A music, speech, and noise corpus,” arXiv preprint arXiv:1510.08484, 2015.
- “Audio set: An ontology and human-labeled dataset for audio events,” in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017, pp. 776–780.
- “The mtg-jamendo dataset for automatic music tagging,” ICML, 2019.
- “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.