Papers
Topics
Authors
Recent
2000 character limit reached

USTC-KXDIGIT System Description for ASVspoof5 Challenge (2409.01695v1)

Published 3 Sep 2024 in cs.SD, cs.AI, and eess.AS

Abstract: This paper describes the USTC-KXDIGIT system submitted to the ASVspoof5 Challenge for Track 1 (speech deepfake detection) and Track 2 (spoofing-robust automatic speaker verification, SASV). Track 1 showcases a diverse range of technical qualities from potential processing algorithms and includes both open and closed conditions. For these conditions, our system consists of a cascade of a frontend feature extractor and a back-end classifier. We focus on extensive embedding engineering and enhancing the generalization of the back-end classifier model. Specifically, the embedding engineering is based on hand-crafted features and speech representations from a self-supervised model, used for closed and open conditions, respectively. To detect spoof attacks under various adversarial conditions, we trained multiple systems on an augmented training set. Additionally, we used voice conversion technology to synthesize fake audio from genuine audio in the training set to enrich the synthesis algorithms. To leverage the complementary information learned by different model architectures, we employed activation ensemble and fused scores from different systems to obtain the final decision score for spoof detection. During the evaluation phase, the proposed methods achieved 0.3948 minDCF and 14.33% EER in the close condition, and 0.0750 minDCF and 2.59% EER in the open condition, demonstrating the robustness of our submitted systems under adversarial conditions. In Track 2, we continued using the CM system from Track 1 and fused it with a CNN-based ASV system. This approach achieved 0.2814 min-aDCF in the closed condition and 0.0756 min-aDCF in the open condition, showcasing superior performance in the SASV system.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. “Yourtts: Towards zero-sho multi-speaker tts and zero-shot voice conversion for everyone,” in International Conference on Machine Learning, 2022, pp. 2709–2720.
  2. “Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge,” in Interspeech 2015, 2015, pp. 2037–2041.
  3. “The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detection,” in Interspeech 2017, 2017, pp. 2–6.
  4. “Asvspoof 2019: Future horizons in spoofed and fake audio detection,” in Interspeech 2019, 2019, pp. 1008–1012.
  5. “Asvspoof 2021: accelerating progress in spoofed and deepfake speech detection,” in ASVspoof 2021 Workshop, 2021, pp. 47–54.
  6. “ADD 2023: The second audio deepfake detection challenge,” in IJCAI Workshop on Deepfake Audio Detection and Analysis, 2023.
  7. “Sasv 2022: The first spoofing-aware speaker verification challenge,” in Proc. Interspeech, 2022, vol. 2022, pp. 2893–2897.
  8. “The USTC-NERCSLIP system for the Track 1.2 of Audio Deepfake Detection (ADD 2023) challenge,” in IJCAI Workshop on Deepfake Audio Detection and Analysis, 2023.
  9. “Automatic speaker verification spoofing and deepfake detection using wav2vec2.0 and data augmentation,” in Odyssey, 2022, pp. 112–119.
  10. “Robust spoof speech detection based on multi-scale feature aggregation and dynamic convolution,” in ICASSP, 2024, pp. 10156–10160.
  11. “Singgan: Generative adversarial network for high-fidelity singing voice generation,” in ACM Multimedia 2022, 2022, pp. 2525–2535.
  12. “Rawboost: A raw data boosting and augmentation method applied to automatic speaker verification anti-spoofing,” in ICASSP, 2022, pp. 6382–6386.
  13. “A study on data augmentation in voice anti-spoofing,” Speech Communication, vol. 141, pp. 56–67, 2022.
  14. “Replay and synthetic speech detection with res2net architecture,” in ICASSP, 2021, pp. 6354–6358.
  15. “One-class learning towards synthetic voice spoofing detection,” IEEE Signal Processing Letters, vol. 28, pp. 937–941, 2021.
  16. “Improved one-class learning for voice spoofing detection,” in 2023 Asia pacific signal and information processing association annual summit and conference, 2023, pp. 1978–1983.
  17. “ASVspoof 5: Crowdsourced speech data, deepfakes, and adversarial attacks at scale,” in ASVspoof Workshop 2024 (accepted), 2024.
  18. “MUSAN: A music, speech, and noise corpus,” in arXiv preprint arXiv:1510.08484, 2015.
  19. “A study on data augmentation of reverberant speech for robust speech recognition,” in ICASSP, 2017, pp. 5220–5224.
  20. “Fastdiff: A fast conditional diffusion model for high-quality speech synthesis,” in IJCAI International Joint Conference on Artificial Intelligence. IJCAI: International Joint Conferences on Artificial Intelligence Organization, 2022, pp. 4157–4163.
  21. “Freev: Free lunch for vocoders through pseudo inversed mel filter,” in Proc. Interspeech, vol. 2024.
  22. “Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis,” Advances in neural information processing systems, vol. 33, pp. 17022–17033, 2020.
  23. “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” in International Conference on Machine Learning. PMLR, 2021, pp. 5530–5540.
  24. “Voxceleb2: Deep speaker recognition,” Interspeech 2018, 2018.
  25. “End-to-end anti-spoofing with rawnet2,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6369–6373.
  26. “Robust audio anti-spoofing with fusion-reconstruction learning on multi-order spectrograms,” in Interspeech 2023, pp. 271–275.
  27. “Additive margin softmax for face verification,” IEEE Signal Processing Letters, vol. 25, no. 7, pp. 926–930, 2018.
  28. “Circle loss: A unified perspective of pair similarity optimization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 6398–6407.
  29. Michael JD Powell, “A view of algorithms for optimization without derivatives,” Mathematics Today-Bulletin of the Institute of Mathematics and its Applications, vol. 43, no. 5, pp. 170–174, 2007.
  30. “Wav2vec2.0: A framework for self-supervised learning of speech representations,” in Adv. Neural Inf. Process. Syst.(NeurIPS), 2020, pp. 12449–12460.
  31. “A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection,” in Interspeech 2021, 2021, pp. 4259–4263.
  32. “AASIST: Audio anti-spoofing using integrated spectro-temporal graph attention networks,” in ICASSP, 2022, pp. 6367–6371.
  33. “Fine-tune Pre-Trained Models with Multi-Level Feature Fusion for Speaker Verification,” in Interspeech 2024, 2024.
  34. “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  35. “Multi-query multi-head attention pooling and inter-topk penalty for speaker verification,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6737–6741.
  36. “Arcface: Additive angular margin loss for deep face recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4690–4699.
  37. “Sub-center arcface: Boosting face recognition by large-scale noisy web faces,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, 2020, pp. 741–757.
  38. “The idlab voxceleb speaker recognition challenge 2020 system description,” arXiv preprint arXiv:2010.12468, 2020.
  39. “The speakin speaker verification system for far-field speaker verification challenge 2022,” arXiv preprint arXiv:2209.11625, 2022.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.