Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bridging the Spoof Gap: A Unified Parallel Aggregation Network for Voice Presentation Attacks

Published 19 Sep 2023 in cs.SD and eess.AS | (2309.10560v1)

Abstract: Automatic Speaker Verification (ASV) systems are increasingly used in voice bio-metrics for user authentication but are susceptible to logical and physical spoofing attacks, posing security risks. Existing research mainly tackles logical or physical attacks separately, leading to a gap in unified spoofing detection. Moreover, when existing systems attempt to handle both types of attacks, they often exhibit significant disparities in the Equal Error Rate (EER). To bridge this gap, we present a Parallel Stacked Aggregation Network that processes raw audio. Our approach employs a split-transform-aggregation technique, dividing utterances into convolved representations, applying transformations, and aggregating the results to identify logical (LA) and physical (PA) spoofing attacks. Evaluation of the ASVspoof-2019 and VSDC datasets shows the effectiveness of the proposed system. It outperforms state-of-the-art solutions, displaying reduced EER disparities and superior performance in detecting spoofing attacks. This highlights the proposed method's generalizability and superiority. In a world increasingly reliant on voice-based security, our unified spoofing detection system provides a robust defense against a spectrum of voice spoofing attacks, safeguarding ASVs and user data effectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. M. S. Obaidat, S. P. Rana, T. Maitra, D. Giri, and S. Dutta, “Biometric security and internet of things (iot),” in Biometric-based physical and cybersecurity systems.   Springer, 2019, pp. 477–509.
  2. M. Sahidullah, H. Delgado, M. Todisco, T. Kinnunen, N. Evans, J. Yamagishi, and K.-A. Lee, “Introduction to voice presentation attack detection and recent advances,” Handbook of biometric anti-spoofing, pp. 321–361, 2019.
  3. A. Khan, K. M. Malik, J. Ryan, and M. Saravanan, “Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures,” Artificial Intelligence Review, pp. 1–54, 2023.
  4. R. K. Das, J. Yang, and H. Li, “Assessing the scope of generalized countermeasures for anti-spoofing,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2020, pp. 6589–6593.
  5. M. Witkowski, S. Kacprzak, P. Zelasko, K. Kowalczyk, and J. Galka, “Audio replay attack detection using high-frequency features.” in Interspeech, 2017, pp. 27–31.
  6. Y. Zhang12, W. Wang12, and P. Zhang12, “The effect of silence and dual-band fusion in anti-spoofing system,” 2021.
  7. J.-w. Jung, H.-S. Heo, H. Tak, H.-j. Shim, J. S. Chung, B.-J. Lee, H.-J. Yu, and N. Evans, “Aasist: Audio anti-spoofing using integrated spectro-temporal graph attention networks,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2022, pp. 6367–6371.
  8. H. Tak, J.-w. Jung, J. Patino, M. Kamble, M. Todisco, and N. Evans, “End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection,” arXiv preprint arXiv:2107.12710, 2021.
  9. H. Muckenhirn, M. Magimai-Doss, and S. Marcel, “End-to-end convolutional neural network-based voice presentation attack detection,” in 2017 IEEE International Joint Conference on Biometrics (IJCB), 2017, pp. 335–341.
  10. C.-I. Lai, N. Chen, J. Villalba, and N. Dehak, “Assert: Anti-spoofing with squeeze-excitation and residual networks,” arXiv preprint arXiv:1904.01120, 2019.
  11. G. Lavrentyeva, S. Novoselov, A. Tseren, M. Volkova, A. Gorlanov, and A. Kozlov, “Stc antispoofing systems for the asvspoof2019 challenge,” arXiv preprint arXiv:1904.05576, 2019.
  12. H. Zeinali, T. Stafylakis, G. Athanasopoulou, J. Rohdin, I. Gkinis, L. Burget, J. Černockỳ et al., “Detecting spoofing attacks using vgg and sincnet: but-omilia submission to asvspoof 2019 challenge,” arXiv preprint arXiv:1907.12908, 2019.
  13. R. Li, M. Zhao, Z. Li, L. Li, and Q. Hong, “Anti-spoofing speaker verification system with multi-feature integration and multi-task learning.” in Interspeech, 2019, pp. 1048–1052.
  14. M. Aljasem, A. Irtaza, H. Malik, N. Saba, A. Javed, K. M. Malik, and M. Meharmohammadi, “Secure automatic speaker verification (sasv) system through sm-altp features and asymmetric bagging,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3524–3537, 2021.
  15. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492–1500.
  16. A. Khan, K. M. Malik, J. Ryan, and M. Saravanan, “Voice spoofing countermeasures: Taxonomy, state-of-the-art, experimental analysis of generalizability, open challenges, and the way forward,” arXiv preprint arXiv:2210.00417, 2022.
  17. X. Li, N. Li, C. Weng, X. Liu, D. Su, D. Yu, and H. Meng, “Replay and synthetic speech detection with res2net architecture,” in ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP).   IEEE, 2021, pp. 6354–6358.
  18. Z. Wu, E. S. Chng, and H. Li, “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in Thirteenth Annual Conference of the International Speech Communication Association, 2012.
  19. K. Sriskandaraja, V. Sethu, E. Ambikairajah, and H. Li, “Front-end for antispoofing countermeasures in speaker verification: Scattering spectral decomposition,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 4, pp. 632–643, 2016.
  20. J. Yang, H. Wang, R. K. Das, and Y. Qian, “Modified magnitude-phase spectrum information for spoofing detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1065–1078, 2021.
  21. M. R. Kamble, H. Tak, and H. A. Patil, “Amplitude and frequency modulation-based features for detection of replay spoof speech,” Speech Communication, vol. 125, pp. 114–127, 2020.
  22. M. Sahidullah, T. Kinnunen, and C. Hanilçi, “A comparison of features for synthetic speech detection,” 2015.
  23. A. Khan, A. Javed, K. M. Malik, M. A. Raza, J. Ryan, A. K. J. Saudagar, and H. Malik, “Toward realigning automatic speaker verification in the era of covid-19,” Sensors, vol. 22, no. 7, p. 2638, 2022.
  24. R. Rahmeni, A. B. Aicha, and Y. B. Ayed, “Speech spoofing detection using svm and elm technique with acoustic features,” in 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP).   IEEE, 2020, pp. 1–4.
  25. S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” in 2017 international conference on engineering and technology (ICET).   Ieee, 2017, pp. 1–6.
  26. F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2403–2412.
  27. Y. Ma, Z. Ren, and S. Xu, “Rw-resnet: A novel speech anti-spoofing model using raw waveform,” arXiv preprint arXiv:2108.05684, 2021.
  28. S. Lee and C. Lee, “Revisiting spatial dropout for regularizing convolutional neural networks,” Multimedia Tools and Applications, vol. 79, no. 45, pp. 34 195–34 207, 2020.
  29. M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. Kinnunen, and K. A. Lee, “Asvspoof 2019: Future horizons in spoofed and fake audio detection,” arXiv preprint arXiv:1904.05441, 2019.
  30. R. Baumann, K. M. Malik, A. Javed, A. Ball, B. Kujawa, and H. Malik, “Voice spoofing detection corpus for single and multi-order audio replays,” Computer Speech & Language, vol. 65, p. 101132, 2021.
  31. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  32. H. Tak, M. Todisco, X. Wang, J.-w. Jung, J. Yamagishi, and N. Evans, “Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation,” arXiv preprint arXiv:2202.12233, 2022.
  33. X. Wang and J. Yamagishi, “A comparative study on recent neural spoofing countermeasures for synthetic speech detection,” arXiv preprint arXiv:2103.11326, 2021.
  34. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034.
  35. B. Chettri, D. Stoller, V. Morfi, M. A. M. Ramírez, E. Benetos, and B. L. Sturm, “Ensemble models for spoofing detection in automatic speaker verification,” arXiv preprint arXiv:1904.04589, 2019.
  36. J. Monteiro, J. Alam, and T. H. Falk, “Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers,” Computer Speech & Language, vol. 63, p. 101096, 2020.
  37. A. Gomez-Alanis, A. M. Peinado, J. A. Gonzalez, and A. M. Gomez, “A light convolutional gru-rnn deep feature extractor for asv spoofing detection,” in Proc. Interspeech, vol. 2019, 2019, pp. 1068–1072.
  38. P. Aravind, U. Nechiyil, N. Paramparambath et al., “Audio spoofing verification using deep convolutional neural networks by transfer learning,” arXiv preprint arXiv:2008.03464, 2020.
  39. Y. Zhang, F. Jiang, and Z. Duan, “One-class learning towards synthetic voice spoofing detection,” IEEE Signal Processing Letters, vol. 28, pp. 937–941, 2021.
  40. Z. Wu, R. K. Das, J. Yang, and H. Li, “Light convolutional neural network with feature genuinization for detection of synthetic speech attacks,” arXiv preprint arXiv:2009.09637, 2020.
  41. H. Tak, J. Patino, A. Nautsch, N. Evans, and M. Todisco, “Spoofing attack detection using the non-linear fusion of sub-band classifiers,” arXiv preprint arXiv:2005.10393, 2020.
  42. T. Chen, A. Kumar, P. Nagarsheth, G. Sivaraman, and E. Khoury, “Generalization of audio deepfake detection.” in Odyssey, 2020, pp. 132–137.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.