Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end? (2309.06014v2)

Published 12 Sep 2023 in eess.AS and cs.SD

Abstract: A speech spoofing countermeasure (CM) that discriminates between unseen spoofed and bona fide data requires diverse training data. While many datasets use spoofed data generated by speech synthesis systems, it was recently found that data vocoded by neural vocoders were also effective as the spoofed training data. Since many neural vocoders are fast in building and generation, this study used multiple neural vocoders and created more than 9,000 hours of vocoded data on the basis of the VoxCeleb2 corpus. This study investigates how this large-scale vocoded data can improve spoofing countermeasures that use data-hungry self-supervised learning (SSL) models. Experiments demonstrated that the overall CM performance on multiple test sets improved when using features extracted by an SSL model continually trained on the vocoded data. Further improvement was observed when using a new SSL distilled from the two SSLs before and after the continual training. The CM with the distilled SSL outperformed the previous best model on challenging unseen test sets, including the ASVspoof 2019 logical access, WaveFake, and In-the-Wild.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “Spoofing and Countermeasures for Speaker Verification: A survey,” Speech Communication, vol. 66, pp. 130–153, feb 2015.
  2. “ASVspoof 2019: A Large-scale Public Database of Synthesized, Converted and Replayed Speech,” Computer Speech & Language, vol. 64, pp. 101114, nov 2020.
  3. “Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders,” in Proc. ICASSP, 2023, pp. 1–5.
  4. “AI-Synthesized Voice Detection Using Neural Vocoder Artifacts,” in Proc. CVPR Workshops, 2023, pp. 904–912.
  5. “A Cross-vocoder Study of Speaker Independent Synthetic Speech Detection Using Phase Information,” in Proc. Interspeech, 2014.
  6. “Joint Speaker Verification and Antispoofing in the i-vector Space,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 4, pp. 821–832, 2015.
  7. “VoxCeleb2: Deep speaker recognition,” in Proc. Interspeech, 2018, pp. 1086–1090.
  8. “Towards Single Integrated Spoofing-aware Speaker Verification Embeddings,” Proc. Interspeech, pp. 3989–3993, 2023.
  9. “Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms,” Proc. Interspeech, pp. 1998–2002, 2023.
  10. “HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis,” in Proc. NIPS, 2020, vol. 33, pp. 17022–17033.
  11. “WaveGlow: A Flow-based Generative Network for Speech Synthesis,” in Proc. ICASSP, 2019, pp. 3617–3621.
  12. “Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 402–415, 2020.
  13. “Self-Supervised Speech Representation Learning: A Review,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1179–1210, oct 2022.
  14. “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” in Proc. NIPS, 2020, vol. 33, pp. 12449–12460.
  15. “Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation,” in Proc. Odyssey, 2022, pp. 112–119.
  16. “Investigating Self-Supervised Front Ends for Speech Spoofing Countermeasures,” in Proc. Odyssey, 2022, pp. 100–106.
  17. “The Vicomtech Audio Deepfake Detection System Based on Wav2vec2 for the 2022 ADD Challenge,” in Proc. ICASSP. IEEE, 2022, pp. 9241–9245.
  18. “ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2507–2522, 2023.
  19. “Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?,” in Proc. ASVspoof Challenge workshop, 2021, pp. 55–60.
  20. “The Effect of Silence and Dual-Band Fusion in Anti-Spoofing System,” in Proc. Interspeech, 2021, pp. 4279–4283.
  21. “WaveFake: A Data Set to Facilitate Audio DeepFake Detection,” in Proc. NeurIPS Datasets and Benchmarks 2021, 2021.
  22. “Does Audio Deepfake Detection Generalize?,” Proc. Interspeech, pp. 2783–2787, 2022.
  23. “RawBoost: A Raw Data Boosting and Augmentation Method applied to Automatic Speaker Verification Anti-Spoofing,” in Proc. ICASSP, 2022, pp. 6382–6386.
  24. “Adam: A method for stochastic optimization,” in Proc. ICLR, 2014.
  25. “Unsupervised Cross-Lingual Representation Learning for Speech Recognition,” in Proc. Interspeech, 2021, pp. 2426–2430.
  26. “FairSeq: A fast, extensible toolkit for sequence modeling,” in Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
  27. “Investigating active-learning-based training data selection for speech spoofing countermeasure,” in Proc. SLT, 2023, pp. 585–592.
  28. “Deepfake audio detection by speaker verification,” arXiv preprint arXiv:2209.14098, 2022.
Citations (16)

Summary

We haven't generated a summary for this paper yet.