Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A robust audio deepfake detection system via multi-view feature (2403.01960v1)

Published 4 Mar 2024 in cs.SD and eess.AS

Abstract: With the advancement of generative modeling techniques, synthetic human speech becomes increasingly indistinguishable from real, and tricky challenges are elicited for the audio deepfake detection (ADD) system. In this paper, we exploit audio features to improve the generalizability of ADD systems. Investigation of the ADD task performance is conducted over a broad range of audio features, including various handcrafted features and learning-based features. Experiments show that learning-based audio features pretrained on a large amount of data generalize better than hand-crafted features on out-of-domain scenarios. Subsequently, we further improve the generalizability of the ADD system using proposed multi-feature approaches to incorporate complimentary information from features of different views. The model trained on ASV2019 data achieves an equal error rate of 24.27\% on the In-the-Wild dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Asvspoof 2019: Future horizons in spoofed and fake audio detection,” arXiv preprint arXiv:1904.05441, 2019.
  2. “Asvspoof 2021: accelerating progress in spoofed and deepfake speech detection,” arXiv preprint arXiv:2109.00537, 2021.
  3. “Add 2022: the first audio deep synthesis detection challenge,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9216–9220.
  4. “Long range acoustic features for spoofed speech detection.,” in Interspeech, 2019, pp. 1058–1062.
  5. “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning. PMLR, 2023, pp. 28492–28518.
  6. “Improved deepfake detection using whisper features,” arXiv preprint arXiv:2306.01428, 2023.
  7. “Investigating self-supervised front ends for speech spoofing countermeasures,” arXiv preprint arXiv:2111.07725, 2021.
  8. “Learning A Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection,” in Proc. INTERSPEECH 2023, 2023, pp. 2808–2812.
  9. “End-to-end anti-spoofing with rawnet2,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6369–6373.
  10. “Speaker recognition from raw waveform with sincnet,” in 2018 IEEE spoken language technology workshop (SLT). IEEE, 2018, pp. 1021–1028.
  11. “Leaf: A learnable frontend for audio classification,” arXiv preprint arXiv:2101.08596, 2021.
  12. “High fidelity neural audio compression,” arXiv preprint arXiv:2210.13438, 2022.
  13. “Audiodec: An open-source streaming high-fidelity neural audio codec,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  14. “Masked autoencoders that listen,” Advances in Neural Information Processing Systems, vol. 35, pp. 28708–28720, 2022.
  15. “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3451–3460, 2021.
  16. “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022.
  17. “Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech,” Computer Speech & Language, vol. 64, pp. 101114, 2020.
  18. “Does audio deepfake detection generalize?,” arXiv preprint arXiv:2203.16263, 2022.
  19. “Xls-r: Self-supervised cross-lingual speech representation learning at scale,” arXiv preprint arXiv:2111.09296, 2021.
  20. “Audio deepfake detection: A survey,” arXiv preprint arXiv:2308.14970, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yujie Yang (29 papers)
  2. Haochen Qin (3 papers)
  3. Hang Zhou (166 papers)
  4. Chengcheng Wang (14 papers)
  5. Tianyu Guo (33 papers)
  6. Kai Han (184 papers)
  7. Yunhe Wang (145 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.