Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction (2310.04567v2)

Published 6 Oct 2023 in eess.AS and cs.SD

Abstract: Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background. This study introduces DPM-TSE, a first generative method based on diffusion probabilistic modeling (DPM) for target sound extraction, to achieve both cleaner target renderings as well as improved separability from unwanted sounds. The technique also tackles common background noise issues with DPM by introducing a correction method for noise schedules and sample steps. This approach is evaluated using both objective and subjective quality metrics on the FSD Kaggle 2018 dataset. The results show that DPM-TSE has a significant improvement in perceived quality in terms of target extraction and purity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. “Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking,” Proc. Interspeech 2019, pp. 2728–2732, 2019.
  2. “Conditioned source separation for musical instrument performances,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2083–2095, 2021.
  3. “Universal speaker extraction in the presence and absence of target speakers for speech of one and two talkers.,” in Interspeech, 2021, pp. 1469–1473.
  4. “Few-shot learning of new sound classes for target sound extraction,” arXiv preprint arXiv:2106.07144, 2021.
  5. “One-shot conditional audio filtering of arbitrary sounds,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 501–505.
  6. “Learning to separate sounds from weakly labeled scenes,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 91–95.
  7. “Source separation with weakly labelled data: An approach to computational auditory scene analysis,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 101–105.
  8. “Listen to what you want: Neural network-based universal sound selector,” Proc. Interspeech 2020, pp. 1441–1445, 2020.
  9. “Improving target sound extraction with timestamp information,” arXiv preprint arXiv:2204.00821, 2022.
  10. “Wavegrad: Estimating gradients for waveform generation,” arXiv preprint arXiv:2009.00713, 2020.
  11. “Diffusion-based voice conversion with fast maximum likelihood sampling scheme,” arXiv preprint arXiv:2109.13821, 2021.
  12. “Diffsinger: Singing voice synthesis via shallow diffusion mechanism,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, vol. 36, pp. 11020–11028.
  13. “Conditional diffusion probabilistic model for speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7402–7406.
  14. “Speech enhancement and dereverberation with diffusion-based generative models,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351–2364, 2023.
  15. “Diffusion-based generative speech source separation,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  16. “Freesound datasets: a platform for the creation of open audio datasets,” International Society for Music Information Retrieval (ISMIR), 2017, pp. 486–93.
  17. “Common diffusion noise schedules and sample steps are flawed,” arXiv preprint arXiv:2305.08891, 2023.
  18. “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  19. “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241.
  20. “Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis,” Advances in Neural Information Processing Systems, vol. 33, pp. 17022–17033, 2020.
  21. “Audio set: An ontology and human-labeled dataset for audio events,” in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017, pp. 776–780.
  22. “A multi-device dataset for urban acoustic scene classification,” in Scenes and Events 2018 Workshop (DCASE2018), p. 9.
  23. “Environmental sound classification with parallel temporal-spectral attention,” Proc. Interspeech 2020, pp. 821–825, 2020.
  24. “Real-time target sound extraction,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  25. Yi Luo and Nima Mesgarani, “Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation,” IEEE/ACM transactions on audio, speech, and language processing, vol. 27, no. 8, pp. 1256–1266, 2019.
  26. “Performance measurement in blind audio source separation,” IEEE transactions on audio, speech, and language processing, vol. 14, no. 4, pp. 1462–1469, 2006.
  27. “Sdr–half-baked or well done?,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 626–630.
  28. “Crowdsourced pairwise-comparison for source separation evaluation,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 606–610.
  29. “Visqol v3: An open source production ready objective speech and audio metric,” in 2020 twelfth international conference on quality of multimedia experience (QoMEX). IEEE, 2020, pp. 1–6.
  30. “Cdpam: Contrastive learning for perceptual audio similarity,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 196–200.
Citations (6)

Summary

We haven't generated a summary for this paper yet.