Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space (2401.11156v2)

Published 20 Jan 2024 in cs.CR, cs.AI, cs.SD, and eess.AS

Abstract: It is now well-known that automatic speaker verification (ASV) systems can be spoofed using various types of adversaries. The usual approach to counteract ASV systems against such attacks is to develop a separate spoofing countermeasure (CM) module to classify speech input either as a bonafide, or a spoofed utterance. Nevertheless, such a design requires additional computation and utilization efforts at the authentication stage. An alternative strategy involves a single monolithic ASV system designed to handle both zero-effort imposter (non-targets) and spoofing attacks. Such spoof-aware ASV systems have the potential to provide stronger protections and more economic computations. To this end, we propose to generalize the standalone ASV (G-SASV) against spoofing attacks, where we leverage limited training data from CM to enhance a simple backend in the embedding space, without the involvement of a separate CM module during the test (authentication) phase. We propose a novel yet simple backend classifier based on deep neural networks and conduct the study via domain adaptation and multi-task integration of spoof embeddings at the training stage. Experiments are conducted on the ASVspoof 2019 logical access dataset, where we improve the performance of statistical ASV backends on the joint (bonafide and spoofed) and spoofed conditions by a maximum of 36.2% and 49.8% in terms of equal error rates, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Z. Bai and X. Zhang, “Speaker recognition based on deep learning: An overview,” Neural Networks, vol. 140, pp. 65–99, 2021.
  2. P. Kenny, G. Boulianne, P. Ouellet, and P. Dumouchel, “Joint factor analysis versus eigenchannels in speaker recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1435–1447, 2007.
  3. N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788–798, 2011.
  4. D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X-vectors: Robust DNN embeddings for speaker recognition,” in Proc. ICASSP, 2018, pp. 5329–5333.
  5. D. Snyder, D. Garcia-Romero, G. Sell, A. McCree, D. Povey, and S. Khudanpur, “Speaker recognition for multi-speaker conversations using x-vectors,” in Proc. ICASSP, 2019, pp. 5796–5800.
  6. B. Desplanques, J. Thienpondt, and K. Demuynck, “ECAPA-TDNN: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification,” in Proc. Interspeech 2020, 2020, pp. 3830–3834.
  7. Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, and H. Li, “Spoofing and countermeasures for speaker verification: A survey,” Speech Communication, vol. 66, pp. 130–153, 2015.
  8. Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilçi, M. Sahidullah, and A. Sizov, “ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge,” in Proc. Interspeech, 2015, pp. 2037–2041.
  9. T. Kinnunen, M. Sahidullah, H. Delgado, M. Todisco, N. Evans, J. Yamagishi, and K. A. Lee, “The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection,” in Proc. Interspeech, 2017, pp. 2–6.
  10. M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. H. Kinnunen, and K. A. Lee, “ASVspoof 2019: Future horizons in spoofed and fake audio detection,” in Proc. Interspeech, 2019, pp. 1008–1012.
  11. J. Yamagishi, X. Wang, M. Todisco, M. Sahidullah, J. Patino, A. Nautsch, X. Liu, K. A. Lee, T. Kinnunen, N. Evans, and H. Delgado, “ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection,” in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, 2021, pp. 47–54.
  12. J. Yi, R. Fu, J. Tao, S. Nie, H. Ma, C. Wang, T. Wang, Z. Tian, Y. Bai, C. Fan, S. Liang, S. Wang, S. Zhang, X. Yan, L. Xu, Z. Wen, and H. Li, “ADD 2022: the first audio deep synthesis detection challenge,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 9216–9220.
  13. L. Zhang, X. Wang, E. Cooper, N. W. D. Evans, and J. Yamagishi, “The partialspoof database and countermeasures for the detection of short generated audio segments embedded in a speech utterance,” ArXiv, vol. abs/2204.05177, 2022.
  14. T. Kinnunen, H. Delgado, N. Evans, K. A. Lee, V. Vestman, A. Nautsch, M. Todisco, X. Wang, M. Sahidullah, J. Yamagishi, and D. A. Reynolds, “Tandem assessment of spoofing countermeasures and automatic speaker verification: Fundamentals,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2195–2210, 2020.
  15. M. Sahidullah, H. Delgado, M. Todisco, H. Yu, T. Kinnunen, N. Evans, and Z.-H. Tan, “Integrated spoofing countermeasures and automatic speaker verification: An evaluation on ASVspoof 2015,” in Proc. Interspeech, 2016, pp. 1700–1704.
  16. Y. Zhang, G. Zhu, and Z. Duan, “A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification,” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 2022, pp. 77–84.
  17. J.-w. Jung, H.-S. Heo, H. Tak, H.-j. Shim, J. S. Chung, B.-J. Lee, H.-J. Yu, and N. Evans, “AASIST: Audio anti-spoofing using integrated spectro-temporal graph attention networks,” in Proc. ICASSP, 2022, pp. 6367–6371.
  18. X. Wang and J. Yamagishi, “Investigating Self-Supervised Front Ends for Speech Spoofing Countermeasures,” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 2022, pp. 100–106.
  19. H. Tak, M. Kamble, J. Patino, M. Todisco, and N. Evans, “Rawboost: A raw data boosting and augmentation method applied to automatic speaker verification anti-spoofing,” in Proc. ICASSP, 2022, pp. 6382–6386.
  20. M. Todisco, H. Delgado, and N. Evans, “Constant q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Computer Speech & Language, vol. 45, pp. 516–535, 2017.
  21. M. Sahidullah, T. Kinnunen, and C. Hanilçi, “A comparison of features for synthetic speech detection,” in Proc. Interspeech 2015, 2015, pp. 2087–2091.
  22. X. Wang and J. Yamagishi, “A comparative study on recent neural spoofing countermeasures for synthetic speech detection,” in Proc. Interspeech, 2021, pp. 4259–4263.
  23. C.-I. Lai, A. Abad, K. Richmond, J. Yamagishi, N. Dehak, and S. King, “Attentive filtering networks for audio replay attack detection,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6316–6320.
  24. G. Hua, A. B. J. Teoh, and H. Zhang, “Towards end-to-end synthetic speech detection,” IEEE Signal Processing Letters, vol. 28, pp. 1265–1269, 2021.
  25. S. Ding, Y. Zhang, and Z. Duan, “SAMO: Speaker attractor multi-center one-class learning for voice anti-spoofing,” in Proc. ICASSP, 2023, pp. 1–5.
  26. A. Nautsch, X. Wang, N. Evans, T. H. Kinnunen, V. Vestman, M. Todisco, H. Delgado, M. Sahidullah, J. Yamagishi, and K. A. Lee, “Asvspoof 2019: Spoofing countermeasures for the detection of synthesized, converted and replayed speech,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 3, no. 2, pp. 252–265, 2021.
  27. X. Wang, X. Qin, T. Zhu, C. Wang, S. Zhang, and M. Li, “The DKU-CMRI system for the ASVspoof 2021 challenge: Vocoder based replay channel response estimation,” in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, 2021, pp. 16–21.
  28. C. Hanilçi, T. Kinnunen, M. Sahidullah, and A. Sizov, “Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise,” Speech Communication, vol. 85, pp. 83–97, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167639316300681
  29. A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, “Joint speaker verification and antispoofing in the i𝑖iitalic_i -vector space,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 4, pp. 821–832, 2015.
  30. A. Gomez-Alanis, J. A. Gonzalez-Lopez, S. P. Dubagunta, A. M. Peinado, and M. Magimai.-Doss, “On joint optimization of automatic speaker verification and anti-spoofing in the embedding space,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 1579–1593, 2021.
  31. J.-H. Choi, J.-Y. Yang, Y.-R. Jeoung, and J.-H. Chang, “HYU submission for the sasv challenge 2022: Reforming speaker embeddings with spoofing-aware conditioning,” in Proc. Interspeech 2022, 2022, pp. 2873–2877.
  32. X. Liu, M. Sahidullah, and T. Kinnunen, “Spoofing-aware speaker verification with unsupervised domain adaptation,” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 2022, pp. 85–91.
  33. R. Caruana, “Multitask learning,” Mach. Learn., vol. 28, no. 1, p. 41–75, jul 1997.
  34. J. Li, M. Sun, and X. Zhang, “Multi-task learning of deep neural networks for joint automatic speaker verification and spoofing detection,” in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 1517–1522.
  35. J. Li, M. Sun, X. Zhang, and Y. Wang, “Joint decision of anti-spoofing and automatic speaker verification by multi-task learning with contrastive loss,” IEEE Access, vol. 8, pp. 7907–7915, 2020.
  36. A. Singhal, “Modern information retrieval: A brief overview.” IEEE Data Eng. Bull., vol. 24, no. 4, pp. 35–43, 2001.
  37. S. Ioffe, “Probabilistic linear discriminant analysis,” in Computer Vision – ECCV 2006, A. Leonardis, H. Bischof, and A. Pinz, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 531–542.
  38. X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K. A. Lee et al., “ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech,” Computer Speech & Language, vol. 64, p. 101114, 2020.
  39. M. Todisco, H. Delgado, K. A. Lee, M. Sahidullah, N. Evans, T. Kinnunen, and J. Yamagishi, “Integrated presentation attack detection and automatic speaker verification: Common features and gaussian back-end fusion,” in Proc. Interspeech, 2018, pp. 77–81.
  40. Z. Wu, R. K. Das, J. Yang, and H. Li, “Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks,” in Proc. Interspeech 2020, 2020, pp. 1101–1105.
  41. Y. Zhang, F. Jiang, and Z. Duan, “One-class learning towards synthetic voice spoofing detection,” IEEE Signal Processing Letters, vol. 28, pp. 937–941, 2021.
  42. H. Tak, J. Patino, M. Todisco, A. Nautsch, N. Evans, and A. Larcher, “End-to-end anti-spoofing with rawnet2,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6369–6373.
  43. A. Cohen, I. Rimon, E. Aflalo, and H. H. Permuter, “A study on data augmentation in voice anti-spoofing,” Speech Communication, vol. 141, pp. 56–67, 2022.
  44. W. Ge, H. Tak, M. Todisco, and N. Evans, “On the potential of jointly-optimised solutions to spoofing attack detection and automatic speaker verification ,” in Proc. IberSPEECH 2022, 2022, pp. 51–55.
  45. J. Kittler, M. Hatef, R. Duin, and J. Matas, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239, 1998.
  46. J. weon Jung, H. Tak, H. jin Shim, H.-S. Heo, B.-J. Lee, S.-W. Chung, H.-J. Yu, N. Evans, and T. Kinnunen, “SASV 2022: The first spoofing-aware speaker verification challenge,” in Proc. Interspeech 2022, 2022, pp. 2893–2897.
  47. C. Zeng, L. Zhang, M. Liu, and J. Yamagishi, “Spoofing-aware attention based asv back-end with multiple enrollment utterances and a sampling strategy for the SASV challenge 2022,” in Proc. Interspeech 2022, 2022, pp. 2883–2887.
  48. A. Alenin, N. Torgashov, A. Okhotnikov, R. Makarov, and I. Yakovlev, “A subnetwork approach for spoofing aware speaker verification,” in Proc. Interspeech 2022, 2022, pp. 2888–2892.
  49. X. Wang, X. Qin, Y. Wang, Y. Xu, and M. Li, “The DKU-OPPO system for the 2022 spoofing-aware speaker verification challenge,” in Proc. Interspeech 2022, 2022, pp. 4396–4400.
  50. H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “MixUp: Beyond empirical risk minimization,” in International Conference on Learning Representations, 2018.
  51. W. Kang, M. J. Alam, and A. Fathan, “End-to-end framework for spoof-aware speaker verification,” in Proc. Interspeech 2022, 2022, pp. 4362–4366.
  52. P.-M. Bousquet and M. Rouvier, “On Robustness of Unsupervised Domain Adaptation for Speaker Recognition,” in Proc. Interspeech, 2019, pp. 2958–2962.
  53. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, 2010, p. 807–814.
  54. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, 2015, p. 448–456.
  55. A. de Brébisson and P. Vincent, “An exploration of softmax alternatives belonging to the spherical loss family,” in Proc. ICLR 2016, 2016.
  56. I. Chingovska, A. R. d. Anjos, and S. Marcel, “Biometrics evaluation under spoofing attacks,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 12, pp. 2264–2276, 2014.
  57. I. Chingovska, A. Anjos, and S. Marcel, “Anti-spoofing in action: Joint operation with a verification system,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013, pp. 98–104.
  58. P. Bell, J. Fainberg, O. Klejch, J. Li, S. Renals, and P. Swietojanski, “Adaptation algorithms for neural network-based speech recognition: An overview,” IEEE Open Journal of Signal Processing, vol. 2, pp. 33–66, 2021.
  59. S. S. Sarfjoo, S. Madikeri, P. Motlicek, and S. Marcel, “Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data,” in Proc. Interspeech, 2020, pp. 3815–3819.
  60. P. Swietojanski, J. Li, and S. Renals, “Learning hidden unit contributions for unsupervised acoustic model adaptation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 8, pp. 1450–1463, 2016.
  61. H. Ahn, S. Cha, D. Lee, and T. Moon, “Uncertainty-based continual learning with adaptive regularization,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32.   Curran Associates, Inc., 2019.
  62. P. Wang, X. Wang, H. Luo, J. Zhou, Z. Zhou, F. Wang, H. Li, and R. Jin, “Scaled ReLU matters for training vision transformers,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2495–2503, Jun. 2022.
  63. S. Ruder, J. Bingel, I. Augenstein, and A. Søgaard, “Latent multi-task architecture learning,” in Proc. AAAI.   AAAI Press, 2019.
  64. S. Ruder, “An overview of multi-task learning in deep neural networks,” CoRR, vol. abs/1706.05098, 2017.
  65. J. Chung, A. Nagrani, E. Coto, W. Xie, M. McLaren, D. A. Reynolds, and A. Zisserman, “VoxSRC 2019: The first VoxCeleb speaker recognition challenge,” in ISCA archive, 2019.
  66. A. Nagrani, J. Chung, and A. Zisserman, “VoxCeleb: A large-scale speaker identification dataset,” in Proc. Interspeech, 2017, pp. 2616–2620.
  67. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
  68. F. Wang, J. Cheng, W. Liu, and H. Liu, “Additive margin softmax for face verification,” IEEE Signal Processing Letters, vol. 25, no. 7, pp. 926–930, 2018.
  69. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
  70. J.-H. Choi, J.-Y. Yang, Y.-R. Jeoung, and J.-H. Chang, “HYU Submission for the SASV Challenge 2022: Reforming Speaker Embeddings with Spoofing-Aware Conditioning,” in Proc. Interspeech, 2022, pp. 2873–2877.
  71. H. Wu, L. Meng, J. Kang, J. Li, X. Li, X. Wu, H. yi Lee, and H. Meng, “Spoofing-aware speaker verification by multi-level fusion,” in Proc. Interspeech, 2022, pp. 4357–4361.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com