Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier (2312.08089v2)
Abstract: With the rapid development of speech synthesis and voice conversion technologies, Audio Deepfake has become a serious threat to the Automatic Speaker Verification (ASV) system. Numerous countermeasures are proposed to detect this type of attack. In this paper, we report our efforts to combine the self-supervised WavLM model and Multi-Fusion Attentive classifier for audio deepfake detection. Our method exploits the WavLM model to extract features that are more conducive to spoofing detection for the first time. Then, we propose a novel Multi-Fusion Attentive (MFA) classifier based on the Attentive Statistics Pooling (ASP) layer. The MFA captures the complementary information of audio features at both time and layer levels. Experiments demonstrate that our methods achieve state-of-the-art results on the ASVspoof 2021 DF set and provide competitive results on the ASVspoof 2019 and 2021 LA set.
- “Spoofing and countermeasures for speaker verification: A review,” in 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 2017, pp. 467–471.
- “Voice spoofing countermeasures: Taxonomy, state-of-the-art, experimental analysis of generalizability, open challenges, and the way forward,” ArXiv, vol. abs/2210.00417, 2022.
- “Siamese network with wav2vec feature for spoofing speech detection,” in Proc. Interspeech 2021, 2021, pp. 4269–4273.
- “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Advances in Neural Information Processing Systems, 2020, vol. 33, pp. 12449–12460.
- J. M. Martín-Doñas and A. Álvarez, “The vicomtech audio deepfake detection system based on wav2vec2 for the 2022 add challenge,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 9241–9245.
- X. Wang and J. Yamagishi, “Investigating Self-Supervised Front Ends for Speech Spoofing Countermeasures,” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 2022, pp. 100–106.
- “Xls-r: Self-supervised cross-lingual speech representation learning at scale,” in Proc. Interspeech 2022, 2022, pp. 2278–2282.
- “Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation,” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 2022, pp. 112–119.
- “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, July 2022.
- “Attentive Statistics Pooling for Deep Speaker Embedding,” in Proc. Interspeech 2018, 2018, pp. 2252–2256.
- “Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification,” in Proc. Interspeech 2020, 2020, pp. 3830–3834.
- “Large-scale self-supervised speech representation learning for automatic speaker verification,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6147–6151.
- “Emotion Recognition from Speech Using wav2vec 2.0 Embeddings,” in Proc. Interspeech 2021, 2021, pp. 3400–3404.
- “Superb: Speech processing universal performance benchmark,” in Proc. Interspeech 2021, 2021, pp. 1194–1198.
- “ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection,” in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, 2021, pp. 47–54.
- “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,” in Proc. Interspeech 2019, 2019, pp. 1008–1012.
- “t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification,” in Speaker Odyssey 2018 The Speaker and Language Recognition Workshop, 2018.
- “Towards end-to-end synthetic speech detection,” IEEE Signal Processing Letters, vol. 28, pp. 1265–1269, 2021.
- “Comparative analysis of asv spoofing countermeasures: Evaluating res2net-based approaches,” IEEE Signal Processing Letters, 2023.
- “The effect of silence and dual-band fusion in anti-spoofing system,” in Proc. Interspeech, 2021.
- “End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection,” in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, 2021, pp. 1–8.
- “Aasist: Audio anti-spoofing using integrated spectro-temporal graph attention networks,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6367–6371.
- “Discriminative frequency information learning for end-to-end speech anti-spoofing,” IEEE Signal Processing Letters, vol. 30, pp. 185–189, 2023.