Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pre-trained Spatial Priors on Multichannel NMF for Music Source Separation (2310.05821v1)

Published 9 Oct 2023 in cs.SD, cs.LG, and eess.AS

Abstract: This paper presents a novel approach to sound source separation that leverages spatial information obtained during the recording setup. Our method trains a spatial mixing filter using solo passages to capture information about the room impulse response and transducer response at each sensor location. This pre-trained filter is then integrated into a multichannel non-negative matrix factorization (MNMF) scheme to better capture the variances of different sound sources. The recording setup used in our experiments is the typical setup for orchestra recordings, with a main microphone and a close "cardioid" or "supercardioid" microphone for each section of the orchestra. This makes the proposed method applicable to many existing recordings. Experiments on polyphonic ensembles demonstrate the effectiveness of the proposed framework in separating individual sound sources, improving performance compared to conventional MNMF methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Z. Rafii, A. Liutkus, F.-R. Stöter, S. I. Mimilakis, and R. Bittner, “The MUSDB18 corpus for music separation,” Dec. 2017.
  2. B. Li, X. Liu, K. Dinesh, Z. Duan, and G. Sharma, “Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications,” IEEE Transactions on Multimedia, vol. 21, no. 2, pp. 522–535, 2019.
  3. Z. Duan and B. Pardo, “Bach 10 dataset—- a versatile polyphonic music dataset,” 2012.
  4. J. Fritsch, “High quality musical audio source separation. master’s thesis,” Master’s thesis, UPMC / IRCAM / Telecom Paristech, 2012.
  5. M. Miron, J. Janer, and E. Gómez, “Monaural score-informed source separation for classical music using convolutional neural networks,” in International Society for Music Information Retrieval Conference, 2017.
  6. P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, “Deep learning for monaural speech separation,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1562–1566, 2014.
  7. P. Seetharaman, G. Wichern, S. Venkataramani, and J. Le Roux, “Class-conditional embeddings for music source separation,” pp. 301–305, 05 2019.
  8. E. Manilow, G. Wichern, and J. L. Roux, “Hierarchical musical instrument separation,” in International Society for Music Information Retrieval Conference, 2020.
  9. M. Gover and P. Depalle, “Score-informed source separation of choral music,” in International Society for Music Information Retrieval Conference, 2020.
  10. O. Slizovskaia, G. Haro, and E. Gómez, “Conditioned source separation for musical instrument performances,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2083–2095, 2021.
  11. E. Manilow, P. Seetharaman, and B. Pardo, “Simultaneous separation and transcription of mixtures with multiple polyphonic and percussive instruments,” pp. 771–775, 05 2020.
  12. S. Sarkar, E. Benetos, and M. Sandler, “Ensembleset: A new high-quality synthesised dataset for chamber ensemble separation,” in International Society for Music Information Retrieval Conference, 2022.
  13. E. Tzinis, S. Venkataramani, and P. Smaragdis, “Unsupervised deep clustering for source separation: Direct learning from mixtures using spatial information,” in ICASSP 2019, pp. 81–85, 2019.
  14. N. Q. K. Duong, E. Vincent, and R. Gribonval, “Under-determined reverberant audio source separation using a full-rank spatial covariance model,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1830–1840, 2010.
  15. H. Sawada, H. Kameoka, S. Araki, and N. Ueda, “Multichannel extensions of non-negative matrix factorization with complex-valued data,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, pp. 971–982, 2013.
  16. Cham: Springer International Publishing, 2018.
  17. K. Sekiguchi, Y. Bando, A. A. Nugraha, K. Yoshii, and T. Kawahara, “Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2610–2625, 2020.
  18. A. J. Muñoz-Montoro, J. J. Carabias-Orti, P. Cabañas-Molero, F. J. Cañadas-Quesada, and N. Ruiz-Reyes, “Multichannel blind music source separation using directivity-aware mnmf with harmonicity constraints,” IEEE Access, vol. 10, pp. 17781–17795, 2022.
  19. B. Li, X. Liu, K. Dinesh, Z. Duan, and G. Sharma, “Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications,” IEEE Transactions on Multimedia, vol. 21, no. 2, pp. 522–535, 2018.
  20. P. Cabañas-Molero, A. J. Muñoz-Montoro, P. Vera-Candeas, and J. Ranilla, “The music demixing machine: toward real-time remixing of classical music,” The Journal of Supercomputing, Apr 2023.

Summary

We haven't generated a summary for this paper yet.