Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies (2311.17088v2)
Abstract: Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy. Meanwhile, detecting deepfakes, at scale, remains a very challenging task that often requires labeled training data from existing deepfake generation methods. Further, even the most accurate supervised deepfake detection methods do not generalize to deepfakes generated using new generation methods. In this paper, we propose a novel unsupervised method for detecting deepfake videos by directly identifying intra-modal and cross-modal inconsistency between video segments. The fundamental hypothesis behind the proposed detection method is that motion or identity inconsistencies are inevitable in deepfake videos. We will mathematically and empirically support this hypothesis, and then proceed to constructing our method grounded in our theoretical analysis. Our proposed method outperforms prior state-of-the-art unsupervised deepfake detection methods on the challenging FakeAVCeleb dataset, and also has several additional advantages: it is scalable because it does not require pristine (real) samples for each identity during inference and therefore can apply to arbitrarily many identities, generalizable because it is trained only on real videos and therefore does not rely on a particular deepfake method, reliable because it does not rely on any likelihood estimation in high dimensions, and explainable because it can pinpoint the exact location of modality inconsistencies which are then verifiable by a human expert.
- Detecting deep-fake videos from appearance and behavior. In 2020 IEEE international workshop on information forensics and security (WIFS), pages 1–6. IEEE, 2020.
- Evading deepfake-image detectors with white-and black-box attacks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 658–659, 2020.
- Not made for each other- audio-visual dissonance-based deepfake detection and localization. In Proceedings of the 28th ACM International Conference on Multimedia, page 439–447, New York, NY, USA, 2020. Association for Computing Machinery.
- Not made for each other- audio-visual dissonance-based deepfake detection and localization, 2021.
- Voxceleb2: Deep speaker recognition. CoRR, abs/1806.05622, 2018.
- Noiseprint: a cnn-based camera model fingerprint. CoRR, abs/1808.08396, 2018.
- Id-reveal: Identity-aware deepfake video detection, 2021.
- Audio-visual person-of-interest deepfake detection. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 943–952, Los Alamitos, CA, USA, 2023. IEEE Computer Society.
- On the detection of digital face manipulation, 2020.
- Arcface: Additive angular margin loss for deep face recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4685–4694, 2019a.
- Lightweight face recognition challenge. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019b.
- The deepfake detection challenge (dfdc) dataset, 2020.
- Towards a robust deepfake detector: Common artifact deepfake detection model. arXiv preprint arXiv:2210.14457, 2022a.
- Protecting celebrities from deepfake with identity consistency transformer, 2022b.
- Megaportraits: One-shot megapixel neural head avatars. arXiv preprint arXiv:2207.07621, 2022.
- Self-supervised video forensics by audio-visual anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10491–10503, 2023.
- Generative adversarial networks, 2014.
- Deepfake video detection using recurrent neural networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–6, 2018.
- Lips don’t lie: A generalisable and robust approach to face forgery detection, 2021.
- Leveraging real talking faces via self-supervision for robust forgery detection, 2022.
- Multimodal forgery detection using ensemble learning. In 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1524–1532, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Forgerynet: A versatile benchmark for comprehensive forgery analysis, 2021.
- Fakepolisher: Making deepfakes more detection-evasive by shallow reconstruction. In Proceedings of the 28th ACM international conference on multimedia, pages 1217–1226, 2020.
- Transfer learning from speaker verification to multispeaker text-to-speech synthesis. Advances in neural information processing systems, 31, 2018.
- Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection, 2020.
- Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
- Fakeavceleb: A novel audio-video multimodal deepfake dataset. CoRR, abs/2108.05080, 2021.
- Adaface: Quality adaptive margin for face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18750–18759, 2022.
- Deepfakes: a new threat to face recognition? assessment and detection, 2018.
- Kodf: A large-scale korean deepfake detection dataset. CoRR, abs/2103.10094, 2021.
- Information-theoretic bounds on the removal of attribute-specific bias from neural networks. arXiv preprint arXiv:2310.04955, 2023.
- Face x-ray for more general face forgery detection, 2020a.
- Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656, 2018a.
- Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656, 2018b.
- In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–7, 2018.
- Celeb-df: A large-scale challenging dataset for deepfake forensics, 2020b.
- Beth Logan et al. Mel frequency cepstral coefficients for music modeling. In Ismir, page 11. Plymouth, MA, 2000.
- Sgdr: Stochastic gradient descent with warm restarts, 2017.
- Decoupled weight decay regularization, 2019.
- Two-branch recurrent network for isolating deepfakes in videos. In European conference on computer vision, pages 667–684. Springer, 2020.
- Emotions don’t lie: An audio-visual deepfake detection method using affective cues. In Proceedings of the 28th ACM international conference on multimedia, pages 2823–2832, 2020.
- Do deep generative models know what they don’t know? In International Conference on Learning Representations, 2019.
- Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7184–7193, 2019.
- A lip sync expert is all you need for speech to lip generation in the wild. In Proceedings of the 28th ACM international conference on multimedia, pages 484–492, 2020.
- Ganimation: Anatomically-aware facial animation from a single image. In Proceedings of the European conference on computer vision (ECCV), pages 818–833, 2018.
- Robust speech recognition via large-scale weak supervision, 2022.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Faceforensics++: Learning to detect manipulated facial images, 2019.
- Selim Seferbekov. Deepfake detection (dfdc) solution by @selimsef, 2020.
- Lip sync matters: A novel multimodal forgery detector. 2022.
- Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics, 36:1–13, 2017.
- Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3733–3742, 2018.
- Exposing deep fakes using inconsistent head poses. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8261–8265. IEEE, 2019.
- Memory fusion network for multi-view sequential learning. In Proceedings of the AAAI conference on artificial intelligence, 2018.
- Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503, 2016.
- Exploring temporal coherence for more general video face forgery detection, 2021.
- Face forensics in the wild, 2021.
- Joint audio-visual deepfake detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14800–14809, 2021.
- Webface260m: A benchmark unveiling the power of million-scale deep face recognition. CoRR, abs/2103.04098, 2021.
- Mulin Tian (2 papers)
- Mahyar Khayatkhoei (17 papers)
- Joe Mathai (8 papers)
- Wael AbdAlmageed (40 papers)