Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Toward Deep Drum Source Separation (2312.09663v3)

Published 15 Dec 2023 in eess.AS, cs.LG, and cs.SD

Abstract: In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Supervised speech separation based on deep learning: An overview. IEEE/ACM Trans. Audio Speech Lang. Process., 26(10):1702–1726, 2018.
  2. Universal sound separation. In IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), pages 175–179, 2019.
  3. Open-Unmix – A reference implementation music source separation. J. Open Source Softw., 4(41):1667, 2019.
  4. Spleeter: A fast and efficient music source separation tool with pre-trained models. J. Open Source Softw., 5(50):2154, 2020.
  5. Meta-learning extractors for music source separation. In 2020 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 816–820, 2020.
  6. All for one and one for all: Improving music separation by bridging networks. In 2021 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 51–55, 2021.
  7. Alexandre Défossez. Hybrid spectrogram and waveform source separation. In Proc. MDX Workshop, pages 1–13, 2021.
  8. The 2018 signal separation evaluation campaign. In Latent Variable Analysis and Signal Separation, pages 293–305, 2018.
  9. Music demixing challenge 2021. Frontiers in Signal Processing, 1, 2022.
  10. The sound demixing challenge 2023 — Music demixing track. arXiv preprint arXiv:2308.06979, 2023.
  11. A review of automatic drum transcription. IEEE/ACM Trans. Audio Speech Lang. Process, 26(9):1457–1483, 2018.
  12. Drum transcription from polyphonic music with recurrent neural networks. In 2017 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 201–205, 2017.
  13. Deep unsupervised drum transcription. In Proc. 20th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), pages 183–191, 2019.
  14. Improving perceptual quality of drum transcription with the Expanded Groove MIDI dataset. arXiv preprint arXiv:2004.00188, 2020.
  15. Global structure-aware drum transcription based on self-attention mechanisms. Signals, 2(3):508–526, 2021.
  16. Real-time transcription and separation of drum recordings based on NMF decomposition. In Int. Conf. Digital Audio Effects (DAFx-14), pages 187–194, 2014.
  17. Reverse engineering the amen break — score-informed separation and restoration applied to drum recordings. IEEE/ACM Trans. Audio Speech Lang. Process, 24(9):1535–1547, 2016.
  18. Sigmoidal NMFD: convolutional NMF with saturating activations for drum mixture decomposition. Electronics, 10(3):284, 2021.
  19. Dual-channel drum separation for low-cost drum recording using non-negative matrix factorization. In 2021 Asia-Pacific Signal Inf. Process. Association Annu. Summit Conf. (APSIPA ASC), pages 17–22, 2021.
  20. MDB Drums – an annotated subset of MedleyDB for automatic drum transcription. In Late-Breaking Demo Session, 18th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), 2017.
  21. ENST-drums: an extensive audio-visual database for drum signals processing. In Proc. 7th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), pages 156–159, 2006.
  22. Learning to groove with inverse sequence transformations. In Int. Conf. Mach. Learning (ICML), volume 97, pages 2269–2279, 2019.
  23. Towards multi-instrument drum transcription. In Int. Conf. Digital Audio Effects (DAFx-18), pages 57–64, 2018.
  24. Singing voice separation with deep U-net convolutional networks. In Proc. 18th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), pages 745–751, 2017.
  25. U-net: Convolutional networks for biomedical image segmentation. In Proc. 18th Mid. Image Comput. Comput.-Assisted Intervention Conf. (MICCAI), pages 234–241, 2015.
  26. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Int. Conf. Mach. Learning (ICML), pages 448–456, 2015.
  27. Signal estimation from modified short-time fourier transform. IEEE Trans. Acoust. Speech Signal Process., 32(2):236–243, 1984.
  28. Acoustic application of phase reconstruction algorithms in optics. In 2022 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 6212–6216, 2022.
  29. Generalized Wiener filtering with fractional power spectrograms. In 2015 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 266–270, 2015.
  30. Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process., 14(4):1462–1469, 2006.
  31. Paris Smaragdis. Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In Ind. Compon. Anal. Blind Signal Separation (ICA BSS), pages 494–499, 2004.
  32. Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. (NIPS), 13, 2000.
  33. mir_eval: A transparent implementation of common MIR metrics. In Proc. 15th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), volume 10, page 2014, 2014.
Citations (2)

Summary

We haven't generated a summary for this paper yet.