What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection (2312.09651v1)
Abstract: The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the emergent effective approaches is continual learning. In this paper, we propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection. The fundamental concept underlying RWM involves categorizing all classes into two groups: those with compact feature distributions across tasks, such as genuine audio, and those with more spread-out distributions, like various types of fake audio. These distinctions are quantified by means of the in-class cosine distance, which subsequently serves as the basis for RWM to introduce a trainable gradient modification direction for distinct data types. Experimental evaluations against mainstream continual learning methods reveal the superiority of RWM in terms of knowledge acquisition and mitigating forgetting in audio deepfake detection. Furthermore, RWM's applicability extends beyond audio deepfake detection, demonstrating its potential significance in diverse machine learning domains such as image recognition.
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Generalized inverses: theory and applications, volume 15. Springer Science & Business Media.
- Scaling Learning Algorithms Towards AI. In Large Scale Kernel Machines. MIT Press.
- Efficient Lifelong Learning with A-GEM. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
- Unsupervised cross-lingual representation learning for speech recognition. arXiv preprint arXiv:2006.13979.
- Haykin, S. S. 2002. Adaptive filter theory. Pearson Education India.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
- The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection. In Lacerda, F., ed., Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, 2–6. ISCA.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13): 3521–3526.
- Langa, J. 2021. Deepfakes, real consequences: Crafting legislation to combat threats posed by deepfakes. BUL Rev., 101: 761.
- Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12): 2935–2947.
- The CLEAR Benchmark: Continual LEArning on Real-World Imagery. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Fake Audio Detection Based On Unsupervised Pretraining Models. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 9231–9235. IEEE.
- Continual Learning for Fake Audio Detection. In Hermansky, H.; Cernocký, H.; Burget, L.; Lamel, L.; Scharenborg, O.; and Motlícek, P., eds., Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021, 886–890. ISCA.
- The Vicomtech Audio Deepfake Detection System Based on Wav2vec2 for the 2022 ADD Challenge. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 9241–9245. IEEE.
- Does Audio Deepfake Detection Generalize? arXiv preprint arXiv:2203.16263.
- Pantserev, K. A. 2020. The malicious use of AI-based deepfake technology as the new threat to psychological security and political stability. Cyber defence in the age of AI, smart societies and augmented humanity, 37–55.
- Continual lifelong learning with neural networks: A review. Neural Networks, 113: 54–71.
- GDumb: A Simple Approach that Questions Our Progress in Continual Learning. In Vedaldi, A.; Bischof, H.; Brox, T.; and Frahm, J., eds., Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part II, volume 12347 of Lecture Notes in Computer Science, 524–540. Springer.
- A comparison of features for synthetic speech detection. In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, 2087–2091. ISCA.
- Optimal filtering algorithms for fast learning in feedforward neural networks. Neural networks, 5(5): 779–787.
- Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation. arXiv preprint arXiv:2202.12233.
- ASVspoof 2019: Future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441.
- Visualizing data using t-SNE. Journal of machine learning research, 9(11).
- CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR).
- Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 8603–8607.
- Investigating self-supervised front ends for speech spoofing countermeasures. arXiv preprint arXiv:2111.07725.
- Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis. In International Conference on Machine Learning.
- ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In Sixteenth annual conference of the international speech communication association.
- ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection. arXiv preprint arXiv:2109.00537.
- Add 2022: the first audio deep synthesis detection challenge. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 9216–9220. IEEE.
- ADD 2023: the Second Audio Deepfake Detection Challenge. CoRR, abs/2305.13774.
- Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 1(8): 364–372.
- One-class learning towards synthetic voice spoofing detection. IEEE Signal Processing Letters, 28: 937–941.
- An empirical study on channel effects for synthetic voice spoofing countermeasure systems. arXiv preprint arXiv:2104.01320.