Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement (2310.04369v1)

Published 6 Oct 2023 in cs.SD, cs.LG, and eess.AS

Abstract: A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling are introduced to expand the receptive field of the model. We propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet. Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. ``Complex ratio masking for monaural speech separation,'' IEEE/ACM transactions on audio, speech, and language processing, vol. 24, no. 3, pp. 483–492, 2015.
  2. ``U-net: Convolutional networks for biomedical image segmentation,'' in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
  3. ``Phase-aware speech enhancement with deep complex u-net,'' in International Conference on Learning Representations, 2019.
  4. ``DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement,'' in Interspeech, 2020, pp. 2472–2476.
  5. ``Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement,'' IEEE/ACM, 2021.
  6. ``Real-time speech enhancement using an efficient convolutional recurrent network for dual-microphone mobile phones in close-talk scenarios,'' in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 5751–5755.
  7. ``Multi-scale temporal frequency convolutional network with axial attention for speech enhancement,'' in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9122–9126.
  8. ``Frcrn: Boosting feature representation using frequency recurrence for monaural speech enhancement,'' in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9281–9285.
  9. ``Harmonic gated compensation network plus for icassp 2022 dns challenge,'' in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9286–9290.
  10. ``The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results,'' in Proc. Interspeech 2020, 2020, pp. 2492–2496.
  11. ``INTERSPEECH 2021 Deep Noise Suppression Challenge,'' in Proc. Interspeech 2021, 2021, pp. 2796–2800.
  12. ``Icassp 2022 deep noise suppression challenge,'' in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9271–9275.
  13. ``S-dccrn: Super wide band dccrn with learnable complex feature for speech enhancement,'' in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7767–7771.
  14. ``Hgcn: Harmonic gated compensation network for speech enhancement,'' in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 371–375.
  15. ``Personalized PercepNet: Real-Time, Low-Complexity Target Voice Separation and Enhancement,'' in Proc. Interspeech 2021, 2021, pp. 1124–1128.
  16. ``Tea-pse: Tencent-ethereal-audio-lab personalized speech enhancement system for icassp 2022 dns challenge,'' in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9291–9295.
  17. ``Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking,'' in Interspeech, 2018.
  18. ``SpEx+: A Complete Time Domain Speaker Extraction Network,'' in Proc. Interspeech 2020, 2020, pp. 1406–1410.
  19. ``Open-unmix-a reference implementation for music source separation,'' Journal of Open Source Software, vol. 4, no. 41, pp. 1667, 2019.
  20. ``Music source separation in the waveform domain,'' arXiv preprint arXiv:1911.13254, 2019.
  21. ``Decoupling magnitude and phase estimation with deep resunet for music source separation,'' in International Society for Music Information Retrieval Conference, 2021.
  22. ``Sams-net: A sliced attention-based neural network for music source separation,'' in 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2021, pp. 1–5.
  23. Yi Luo and Jianwei Yu, ``Music source separation with band-split rnn,'' arXiv preprint arXiv:2209.15174, 2022.
  24. ``Cws-presunet: Music source separation with channel-wise subband phase-aware resunet,'' arXiv preprint arXiv:2112.04685, 2021.
  25. ``The musdb18 corpus for music separation,'' 2017.
  26. ``Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation,'' in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 46–50.
  27. P. P. Vaidyanathan, ``Multirate systems and filter banks,'' Prentice Hall Signal Processing Series, 1993.
  28. ``DCCRN+: Channel-Wise Subband DCCRN with SNR Estimation for Speech Enhancement,'' in Proc. Interspeech 2021, 2021, pp. 2816–2820.
  29. ``ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification,'' in Proc. Interspeech 2020, 2020, pp. 3830–3834.
  30. ``Sdr–half-baked or well done?,'' in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 626–630.
  31. ``M4singer: A multi-style, multi-singer and musical score provided mandarin singing corpus,'' in Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  32. ``Improved singing voice separation with chromagram-based pitch-aware remixing,'' in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 111–115.
Citations (2)

Summary

We haven't generated a summary for this paper yet.