Thech. Report: Genuinization of Speech waveform PMF for speaker detection spoofing and countermeasures (2310.05534v1)
Abstract: In the context of spoofing attacks in speaker recognition systems, we observed that the waveform probability mass function (PMF) of genuine speech differs significantly from the PMF of speech resulting from the attacks. This is true for synthesized or converted speech as well as replayed speech. We also noticed that this observation seems to have a significant impact on spoofing detection performance. In this article, we propose an algorithm, denoted genuinization, capable of reducing the waveform distribution gap between authentic speech and spoofing speech. Our genuinization algorithm is evaluated on ASVspoof 2019 challenge datasets, using the baseline system provided by the challenge organization. We first assess the influence of genuinization on spoofing performance. Using genuinization for the spoofing attacks degrades spoofing detection performance by up to a factor of 10. Next, we integrate the genuinization algorithm in the spoofing countermeasures and we observe a huge spoofing detection improvement in different cases. The results of our experiments show clearly that waveform distribution plays an important role and must be taken into account by anti-spoofing systems.
- J.-F. Bonastre, D. Matrouf, and C. Fredouille, “Artificial impostor voice transformation effects on false acceptance rates,” in INTERSPEECH, 2007.
- Z. Wu, C. E. Siong, and H. Li, “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in INTERSPEECH, 2012.
- Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilci, M. Sahidullah, and A. Sizov, “ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge,” in INTERSPEECH 2015, September 6-10, 2015, Dresden, Germany, 2015.
- M. Todisco, H. Delgado, and N. Evans, “A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients,” in ODYSSEY 2016, The Speaker and Language Recognition Workshop, June 21-24, 2016, Bilbao, Spain, Bilbao, SPAIN, 06 2016.
- Z. Wu, J. Yamagishi, T. Kinnunen, C. Hanilci, M. Sahidullah, A. Sizov, N. Evans, M. Todisco, and H. Delgado, “Asvspoof: The automatic speaker verification spoofing and countermeasures challenge,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 4, pp. 588–604, June 2017.
- I. Himawan, F. Villavicencio, S. Sridharan, and C. Fookes, “Deep domain adaptation for anti-spoofing in speaker verification systems,” Computer Speech & Language, vol. 58, pp. 377–402, 2019.
- M. Todisco, H. Delgado, and N. Evans, “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Computer Speech and Language, vol. 45, pp. 516 – 535, 2017.
- K. Sriskandaraja, V. Sethu, E. Ambikairajah, and H. Li, “Front-end for antispoofing countermeasures in speaker verification: Scattering spectral decomposition,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 4, pp. 632–643, June 2017.
- M. Sahidullah, T. Kinnunen, and C. Hanilçi, “A comparison of features for synthetic speech detection,” in INTERSPEECH, 2015.
- Z. Wu, P. L. De Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, and J. Yamagishi, “Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 768–783, April 2016.
- R. Li, M. Zhao, Z. Li, L. Li, and Q. Hong, “Anti-spoofing speaker verification system with multi-feature integration and multi-task learning.” in Interspeech, 2019, pp. 1048–1052.
- A. Gomez-Alanis, J. A. Gonzalez-Lopez, S. P. Dubagunta, A. M. Peinado, and M. Magimai-Doss, “On joint optimization of automatic speaker verification and anti-spoofing in the embedding space,” IEEE Transactions on Information Forensics and Security, 2020.
- T. B. Patel and H. A. Patil, “Significance of source–filter interaction for classification of natural vs. spoofed speech,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 4, pp. 644–659, June 2017.
- ——, “Cochlear filter and instantaneous frequency based features for spoofed speech detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 4, pp. 618–631, June 2017.
- Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, and H. Li, “Spoofing and countermeasures for speaker verification: A survey,” Speech Communication, vol. 66, pp. 130 – 153, 2015.
- A. Khodabakhsh, A. Mohammadi, and C. Demiroglu, “Spoofing voice verification systems with statistical speech synthesis using limited adaptation data,” Computer Speech and Language, vol. 42, pp. 20 – 37, 2017.
- W. Verhelst and M. Roelands, “An overlap-add technique based on waveform similarity (wsola) for high quality time-scale modification of speech,” in proceedings of ICASSP-93, 1993, pp. 554–557.
- D. D. Deliyski, “Acoustic model and evaluation of pathological voice production,” in EUROSPEECH, 1993.
- P. Alku and E. Vilkman, “Amplitude domain quotient for characterization of the glottal volume velocity waveform estimated by inverse filtering,” Speech Communication, vol. 18, no. 2, pp. 131–138, 1996. [Online]. Available: https://doi.org/10.1016/0167-6393(95)00040-2
- A. N. C. Christer Gobl, “Amplitude-based source parameters for measuring voice quality,” in Voice Quality: Functions, Analysis and Synthesis, 2003.
- J. Rusz, R. Cmejla, H. Ruzickova, and E. Ruzicka, “Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated parkinson’s disease,” The Journal of the Acoustical Society of America, vol. 129, no. 1, pp. 350–367, 2011.
- P. G. Vilda, R. Fernández-Baíllo, M. V. R. Biarge, V. N. Lluis, A. Á. Marquina, L. M. Mazaira-Fernández, R. Martínez-Olalla, and J. I. Godino-Llorente, “Glottal source biometrical signature for voice pathology detection,” Speech Communication, vol. 51, no. 9, pp. 759–781, 2009. [Online]. Available: https://doi.org/10.1016/j.specom.2008.09.005
- O. Ben-Harush, I. Lapidot, and H. Guterman, “Entropy based overlapped speech detection as a pre-processing stage for speaker diarization,” in Proceedings of Interspeech 2009, 2009.
- O. Ben-Harush, H. Guterman, and I. Lapidot, “Frame level entropy based overlapped speech detection as a pre-processing stage for speaker diarization,” in 2009 IEEE International Workshop on Machine Learning for Signal Processing, Sep. 2009, pp. 1–6.
- I. Lapidot, H. Delgado, M. Todisco, N. Evans, and J.-F. Bonastre, “Speech database and protocol validation using waveform entropy,” in INTERSPEECH 2018, 19th Annual Conference of the International Speech Communication Association, September 2-6, 2018, Hyderabad, India, Hyderabad, INDIA, 09 2018.
- I. Lapidot and J.-F. Bonastre, “Effects of waveform PMF on anti-spoofing detection,” in Interspeech 2019, Graz, Austria, Sep. 2019, pp. 2853–2857.
- ——, “Effects of Waveform PMF on Anti-spoofing Detection for Replay Data - ASVspoof 2019,” in Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, 2020, pp. 312–318.
- J. Pelecanos and S. Sridharan, “Feature warping for robust speaker verification,” in ODYSSEY 2001 -The Speaker and Language Recognition Workshop, Crete, Greece, June 2001.
- “ASVspoof 2019: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan,” Tech. Rep., 01 2019.
- F. Hilger, S. Molau, and H. Ney, “Quantile based histogram equalization for online applications,” in 7th International Conference on Spoken Language Processing, ICSLP-2002, Denver, Colorado, USA, September 16-20 2002.
- G. Valenti, H. Delgado, M. Todisco, N. Evans, and L. Pilati, “An end-to-end spoofing countermeasure for automatic speaker verification using evolving recurrent neural networks,” in ODYSSEY 2018, The Speaker and Language Recognition Workshop, Les Sables d’Olonne, FRANCE, June 26-29 2018.
- O. Ben-Harush, O. Ben-Harush, I. Lapidot, and H. Guterman, “Initialization of iterative-based speaker diarization systems for telephone conversations,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 414 –425, feb. 2012.