Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models (2309.08320v2)

Published 14 Sep 2023 in eess.AS and cs.SD

Abstract: Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV framework that leverages DPM. Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure. The proposed model was evaluated under both in-domain and out-of-domain noisy conditions using the VoxCeleb1 test set, an external noise source, and the VOiCES corpus. The obtained experimental results demonstrate that Diff-SV achieves state-of-the-art performance, outperforming recently proposed noise-robust SV systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. “Deep neural networks for small footprint text-dependent speaker verification,” in 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2014, pp. 4052–4056.
  2. “X-vectors: Robust dnn embeddings for speaker recognition,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 5329–5333.
  3. “Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification,” in arXiv preprint arXiv:2005.07143, 2020.
  4. “Rawnext: Speaker verification system for variable-duration utterances with deep layer aggregation and extended dynamic scaling policies,” in 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7647–7651.
  5. Distant speech recognition, John Wiley & Sons, 2009.
  6. “Within-sample variability-invariant loss for robust speaker recognition under noisy environments,” in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6469–6473.
  7. “Audio enhancing with dnn autoencoder for speaker recognition,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 5090–5094.
  8. “Analysis of dnn speech signal enhancement for robust speaker recognition,” in Computer Speech & Language. 2019, vol. 58, pp. 403–421, Elsevier.
  9. “Extended u-net for speaker verification in noisy environments,” in INTERSPEECH, 2022, pp. 590–594.
  10. “A recurrent variational autoencoder for speech enhancement,” in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 371–375.
  11. “Time-frequency masking-based speech enhancement using generative adversarial network,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 5039–5043.
  12. “A flow-based neural network for time domain speech enhancement,” in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 5754–5758.
  13. “Deep unsupervised learning using nonequilibrium thermodynamics,” in ICML. PMLR, 2015, pp. 2256–2265.
  14. “Diffusion models beat gans on image synthesis,” in Advances in neural information processing systems, 2021, vol. 34, pp. 8780–8794.
  15. “Denoising diffusion probabilistic models,” in Advances in neural information processing systems, 2020, vol. 33, pp. 6840–6851.
  16. “A study on speech enhancement based on diffusion probabilistic model,” in 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2021, pp. 659–666.
  17. “Conditional diffusion probabilistic model for speech enhancement,” in 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7402–7406.
  18. “Denoising diffusion implicit models,” in ICLR, 2021.
  19. “Score-based generative modeling through stochastic differential equations,” in ICLR, 2020.
  20. “Voxceleb: A large-scale speaker identification dataset,” in INTERSPEECH. 2017, pp. 2616–2620, ISCA.
  21. “Musan: A music, speech, and noise corpus,” in arXiv preprint arXiv:1510.08484, 2015.
  22. “Joint feature enhancement and speaker recognition with multi-objective task-oriented network,” in INTERSPEECH, 2021, pp. 1089–1093.
  23. “Voiceid loss: Speech enhancement for speaker verification,” in INTERSPEECH, 2019, pp. 2888–2892.
  24. “Noise-disentanglement metric learning for robust speaker verification,” in 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  25. “Grad-tts: A diffusion probabilistic model for text-to-speech,” in ICML. PMLR, 2021, pp. 8599–8608.
  26. Diganta Misra, “Mish: A self regularized non-monotonic activation function,” in BMVC. 2020, BMVA Press.
  27. “Attention is all you need,” in Advances in neural information processing systems, 2017, vol. 30.
  28. “Arcface: Additive angular margin loss for deep face recognition,” in CVPR, 2019, pp. 4690–4699.
  29. “A tandem algorithm for pitch estimation and voiced speech segregation,” in IEEE Transactions on Audio, Speech, and Language Processing. 2010, vol. 18, pp. 2067–2079, IEEE.
  30. “Voices Obscured in Complex Environmental Settings (VOiCES) Corpus,” in INTERSPEECH, 2018, pp. 1566–1570.
  31. “On the convergence of adam and beyond,” in ICLR, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ju-ho Kim (19 papers)
  2. Jungwoo Heo (12 papers)
  3. Hyun-seo Shin (8 papers)
  4. Chan-yeong Lim (7 papers)
  5. Ha-Jin Yu (35 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.