Toward Deep Drum Source Separation (2312.09663v3)
Abstract: In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.
- Supervised speech separation based on deep learning: An overview. IEEE/ACM Trans. Audio Speech Lang. Process., 26(10):1702–1726, 2018.
- Universal sound separation. In IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), pages 175–179, 2019.
- Open-Unmix – A reference implementation music source separation. J. Open Source Softw., 4(41):1667, 2019.
- Spleeter: A fast and efficient music source separation tool with pre-trained models. J. Open Source Softw., 5(50):2154, 2020.
- Meta-learning extractors for music source separation. In 2020 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 816–820, 2020.
- All for one and one for all: Improving music separation by bridging networks. In 2021 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 51–55, 2021.
- Alexandre Défossez. Hybrid spectrogram and waveform source separation. In Proc. MDX Workshop, pages 1–13, 2021.
- The 2018 signal separation evaluation campaign. In Latent Variable Analysis and Signal Separation, pages 293–305, 2018.
- Music demixing challenge 2021. Frontiers in Signal Processing, 1, 2022.
- The sound demixing challenge 2023 — Music demixing track. arXiv preprint arXiv:2308.06979, 2023.
- A review of automatic drum transcription. IEEE/ACM Trans. Audio Speech Lang. Process, 26(9):1457–1483, 2018.
- Drum transcription from polyphonic music with recurrent neural networks. In 2017 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 201–205, 2017.
- Deep unsupervised drum transcription. In Proc. 20th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), pages 183–191, 2019.
- Improving perceptual quality of drum transcription with the Expanded Groove MIDI dataset. arXiv preprint arXiv:2004.00188, 2020.
- Global structure-aware drum transcription based on self-attention mechanisms. Signals, 2(3):508–526, 2021.
- Real-time transcription and separation of drum recordings based on NMF decomposition. In Int. Conf. Digital Audio Effects (DAFx-14), pages 187–194, 2014.
- Reverse engineering the amen break — score-informed separation and restoration applied to drum recordings. IEEE/ACM Trans. Audio Speech Lang. Process, 24(9):1535–1547, 2016.
- Sigmoidal NMFD: convolutional NMF with saturating activations for drum mixture decomposition. Electronics, 10(3):284, 2021.
- Dual-channel drum separation for low-cost drum recording using non-negative matrix factorization. In 2021 Asia-Pacific Signal Inf. Process. Association Annu. Summit Conf. (APSIPA ASC), pages 17–22, 2021.
- MDB Drums – an annotated subset of MedleyDB for automatic drum transcription. In Late-Breaking Demo Session, 18th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), 2017.
- ENST-drums: an extensive audio-visual database for drum signals processing. In Proc. 7th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), pages 156–159, 2006.
- Learning to groove with inverse sequence transformations. In Int. Conf. Mach. Learning (ICML), volume 97, pages 2269–2279, 2019.
- Towards multi-instrument drum transcription. In Int. Conf. Digital Audio Effects (DAFx-18), pages 57–64, 2018.
- Singing voice separation with deep U-net convolutional networks. In Proc. 18th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), pages 745–751, 2017.
- U-net: Convolutional networks for biomedical image segmentation. In Proc. 18th Mid. Image Comput. Comput.-Assisted Intervention Conf. (MICCAI), pages 234–241, 2015.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Int. Conf. Mach. Learning (ICML), pages 448–456, 2015.
- Signal estimation from modified short-time fourier transform. IEEE Trans. Acoust. Speech Signal Process., 32(2):236–243, 1984.
- Acoustic application of phase reconstruction algorithms in optics. In 2022 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 6212–6216, 2022.
- Generalized Wiener filtering with fractional power spectrograms. In 2015 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 266–270, 2015.
- Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process., 14(4):1462–1469, 2006.
- Paris Smaragdis. Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In Ind. Compon. Anal. Blind Signal Separation (ICA BSS), pages 494–499, 2004.
- Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. (NIPS), 13, 2000.
- mir_eval: A transparent implementation of common MIR metrics. In Proc. 15th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), volume 10, page 2014, 2014.