Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0 (2402.17127v1)
Abstract: Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, particularly those utilizing large pretrained wav2vec 2.0 as a featurization front-end, highlights the importance of refined feature encoders. In response, this research assessed the representational capability of wav2vec 2.0 as an audio feature extractor, modifying the size of its pretrained Transformer layers through two key adjustments: (1) selecting a subset of layers starting from the leftmost one and (2) fine-tuning a portion of the selected layers from the rightmost one. We complemented this analysis with five spoofing detection back-end models, with a primary focus on AASIST, enabling us to pinpoint the optimal configuration for the selection and fine-tuning process. In contrast to conventional handcrafted features, our investigation identified several spoofing detection systems that achieve state-of-the-art performance in the ASVspoof 2019 LA dataset. This comprehensive exploration offers valuable insights into feature selection strategies, advancing the field of spoofing detection.
- “A light cnn for deep face representation with noisy labels,” IEEE TIFS, vol. 13, no. 11, pp. 2884–2896, 2018.
- “STC Antispoofing Systems for the ASVspoof2019 Challenge,” in Proc. Interspeech, 2019, pp. 1033–1037.
- “End-to-end anti-spoofing with rawnet2,” in Proc. ICASSP, 2021, pp. 6369–6373.
- M. Ravanelli and Y. Bengio, “Speaker recognition from raw waveform with sincnet,” in IEEE SLT workshop, 2018, pp. 1021–1028.
- “End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection,” in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, 2021, pp. 1–8.
- “Aasist: Audio anti-spoofing using integrated spectro-temporal graph attention networks,” in Proc. ICASSP, 2022, pp. 6367–6371.
- “Graph-based spectro-temporal dependency modeling for anti-spoofing,” in ICASSP. IEEE, 2023, pp. 1–5.
- “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Proc. NeurIPS, 2020, vol. 33, pp. 12449–12460.
- “XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale,” in Proc. Interspeech, 2022, pp. 2278–2282.
- “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” IEEE J-STSP, vol. 16, no. 6, pp. 1505–1518, 2022.
- “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM TASLP, vol. 29, pp. 3451–3460, 2021.
- “Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification,” in Proc. Interspeech, 2022, pp. 2898–2902.
- “Overlapped frequency-distributed network: Frequency-aware voice spoofing countermeasure,” in Proc. Interspeech, 2022, pp. 3558–3562.
- “wav2vec: Unsupervised pre-training for speech recognition,” in Proc. Interspeech, 2019, pp. 3465–3469.
- “Unsupervised Cross-Lingual Representation Learning for Speech Recognition,” in Proc. Interspeech, 2021, pp. 2426–2430.
- “Transformers: State-of-the-art natural language processing,” in Proc. EMNLP, 2020, pp. 38–45.
- “Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck,” in Proc. Interspeech, 2022, pp. 3568–3572.
- J. Martín-Doñas and A. Álvarez, “The vicomtech audio deepfake detection system based on wav2vec2 for the 2022 add challenge,” in Proc. ICASSP, 2022, pp. 9241–9245.
- “Graph attention networks,” arXiv:1710.10903, 2017.
- “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,” in Proc. Interspeech, 2019, pp. 1008–1012.
- “Add 2023: the second audio deepfake detection challenge,” in Proc. IJCAI DADA Workshop, 2023, pp. 1–6.
- “Cau ku deep fake detection system for add 2023 challenge,” in Proc. IJCAI DADA Workshop, 2023, pp. 23–30.
- “Graph Attention Networks for Anti-Spoofing,” in Proc. Interspeech, 2021, pp. 2356–2360.
- “Spoofing Attack Detection Using the Non-Linear Fusion of Sub-Band Classifiers,” in Proc. Interspeech, 2020, pp. 1106–1110.
- “RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform,” in Proc. Interspeech, 2021, pp. 4144–4148.
- X. Wang and J. Yamagishi, “A comparative study on recent neural spoofing countermeasures for synthetic speech detection,” in Proc. Interspeech, 2021, pp. 4259–4263.