Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retrieval-Augmented Audio Deepfake Detection (2404.13892v2)

Published 22 Apr 2024 in cs.SD, cs.AI, and eess.AS

Abstract: With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired by retrieval-augmented generation (RAG), we propose a retrieval-augmented detection (RAD) framework that augments test samples with similar retrieved samples for enhanced detection. We also extend the multi-fusion attentive classifier to integrate it with our proposed RAD framework. Extensive experiments show the superior performance of the proposed RAD framework over baseline methods, achieving state-of-the-art results on the ASVspoof 2021 DF set and competitive results on the 2019 and 2021 LA sets. Further sample analysis indicates that the retriever consistently retrieves samples mostly from the same speaker with acoustic characteristics highly consistent with the query audio, thereby improving detection performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances In Neural Information Processing Systems 33 (2020), 12449–12460.
  2. Zexin Cai and Ming Li. 2024. Integrating frame-level boundary detection and deepfake detection for locating manipulated regions in partially spoofed audio forgery attacks. Computer Speech & Language 85 (2024), 101597.
  3. GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection. Advances in Neural Information Processing Systems (NIPS) 36 (2024).
  4. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. IEEE Journal of Selected Topics in Signal Processing 16 (2021), 1505–1518.
  5. Steven Davis and Paul Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, And Signal Processing 28, 4 (1980), 357–366.
  6. SAMO: Speaker Attractor Multi-Center One-Class Learning For Voice Anti-Spoofing. International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022), 1–5.
  7. Spatial reconstructed local attention Res2Net with F0 subband for fake speech detection. Neural Networks (2024), 106320.
  8. Retrieval-Augmented Generation for Large Language Models: A Survey.
  9. Audio Deepfake Detection With Self-Supervised Wavlm And Multi-Fusion Attentive Classifier. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 12702–12706.
  10. Deepfake audio detection via MFCC features using machine learning. IEEE Access 10 (2022), 134018–134028.
  11. Towards End-to-End Synthetic Speech Detection. IEEE Signal Processing Letters 28 (2021), 1265–1269.
  12. Discriminative Frequency Information Learning for End-to-End Speech Anti-Spoofing. IEEE Signal Processing Letters 30 (2023), 185–189.
  13. Improved DeepFake Detection Using Whisper Features. International Speech Communication Association (Interspeech) abs/2306.01428 (2023).
  14. t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification. (2018).
  15. Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
  16. Channel-wise gated res2net: Towards robust detection of synthetic speech attacks. International Speech Communication Association (Interspeech) (2021).
  17. A capsule network based approach for detection of audio spoofing attacks. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6359–6363.
  18. Juan M. Mart’in-Donas and Aitor Álvarez. 2022. The Vicomtech Audio Deepfake Detection System Based on Wav2vec2 for the 2022 ADD Challenge. International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022), 9241–9245.
  19. Robust speech recognition via large-scale weak supervision. (2023), 28492–28518.
  20. AI-Synthesized Voice Detection Using Neural Vocoder Artifacts. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2023), 904–912.
  21. Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation. Speaker Odyssey Workshop (2022).
  22. End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection. ASVspoof 2021 Workshop-Automatic Speaker Verification and Spoofing Coutermeasures Challenge (2021).
  23. ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. In International Speech Communication Association (Interspeech).
  24. Xin Wang and Junich Yamagishi. 2021. A comparative study on recent neural spoofing countermeasures for synthetic speech detection. International Speech Communication Association (Interspeech) (2021).
  25. Xin Wang and Junichi Yamagishi. 2022. Investigating self-supervised front ends for speech spoofing countermeasures. The Speaker and Language Recognition Workshop abs/2111.07725 (2022).
  26. AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks. International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021), 6367–6371.
  27. ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection. In ASVspoof 2021 Workshop-Automatic Speaker Verification and Spoofing Coutermeasures Challenge.
  28. ADD 2022: the first audio deep synthesis detection challenge. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 9216–9220.
  29. ADD 2023: the Second Audio Deepfake Detection Challenge. ArXiv abs/2305.13774 (2023).
  30. Audio Deepfake Detection: A Survey. ArXiv abs/2308.14970 (2023).
  31. The Effect of Silence and Dual-Band Fusion in Anti-Spoofing System. In International Speech Communication Association (Interspeech).
  32. Pedram Abdzadeh Ziabary and Hadi Veisi. 2021. A countermeasure based on cqt spectrogram for deepfake speech detection. In 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS). IEEE, 1–5.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com