2000 character limit reached
Integrating Audio-Visual Features for Multimodal Deepfake Detection (2310.03827v1)
Published 5 Oct 2023 in cs.CV
Abstract: Deepfakes are AI-generated media in which an image or video has been digitally modified. The advancements made in deepfake technology have led to privacy and security issues. Most deepfake detection techniques rely on the detection of a single modality. Existing methods for audio-visual detection do not always surpass that of the analysis based on single modalities. Therefore, this paper proposes an audio-visual-based method for deepfake detection, which integrates fine-grained deepfake identification with binary classification. We categorize the samples into four types by combining labels specific to each single modality. This method enhances the detection under intra-domain and cross-domain testing.
- “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
- “Learning internal representations by error propagation,” 1985.
- “Level up the deepfake detection: a method to effectively discriminate images generated by gan architectures and diffusion models,” arXiv preprint arXiv:2303.00608, 2023.
- “Deepfake detection: A systematic literature review,” IEEE access, vol. 10, pp. 25494–25513, 2022.
- “Model attribution of face-swap deepfake videos,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 2356–2360.
- “Faketracer: Exposing deepfakes with training data contamination,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 1161–1165.
- “Multi-attentional deepfake detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2185–2194.
- MC Weerawardana and TGI Fernando, “Deepfakes detection methods: A literature survey,” in 2021 10th International Conference on Information and Automation for Sustainability. IEEE, 2021, pp. 76–81.
- “Fakeavceleb: A novel audio-video multimodal deepfake dataset,” arXiv preprint arXiv:2108.05080, 2021.
- “Trusted media challenge dataset and user study,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 3873–3877.
- “Evaluation of an audio-video multimodal deepfake dataset using unimodal and multimodal detectors,” in Proceedings of the 1st workshop on synthetic multimedia-audiovisual deepfake generation and detection, 2021, pp. 7–15.
- “Audio-visual person-of-interest deepfake detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 943–952.
- “The deepfake detection challenge (dfdc) dataset,” 2020.
- “Avoid-df: Audio-visual joint learning for detecting deepfake,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 2015–2029, 2023.
- “Multimodal forgery detection using ensemble learning,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2022, pp. 1524–1532.
- “Use of a capsule network to detect fake images and videos. arxiv 2019,” arXiv preprint arXiv:1910.12467.
- “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.
- “Faceforensics: A large-scale video dataset for forgery detection in human faces,” arXiv preprint arXiv:1803.09179, 2018.
- “Celeb-df: A large-scale challenging dataset for deepfake forensics,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3207–3216.
- “Kodf: A large-scale korean deepfake detection dataset,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10744–10753.
- “Voice-face homogeneity tells deepfake,” arXiv preprint arXiv:2203.02195, 2022.
- “Self-supervised video forensics by audio-visual anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10491–10503.
- “Joint face detection and facial expression recognition with mtcnn,” in 2017 4th international conference on information science and control engineering (ICISCE). IEEE, 2017, pp. 424–427.
- Leland Roberts, “Understanding the mel spectrogram,” Aug 2022.
- “Ai-synthesized voice detection using neural vocoder artifacts,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 904–912.
- “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- “Voxceleb2: Deep speaker recognition,” arXiv preprint arXiv:1806.05622, 2018.
- “Mesonet: a compact facial video forgery detection network,” in 2018 IEEE international workshop on information forensics and security (WIFS). IEEE, 2018, pp. 1–7.
- “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.
- “Exploring temporal coherence for more general video face forgery detection,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 15044–15054.
- “Lip sync matters: A novel multimodal forgery detector,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2022, pp. 1885–1892.
- Sneha Muppalla (1 paper)
- Shan Jia (26 papers)
- Siwei Lyu (125 papers)