Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection (2212.00773v2)

Published 1 Dec 2022 in cs.CV

Abstract: Video synthesis methods rapidly improved in recent years, allowing easy creation of synthetic humans. This poses a problem, especially in the era of social media, as synthetic videos of speaking humans can be used to spread misinformation in a convincing manner. Thus, there is a pressing need for accurate and robust deepfake detection methods, that can detect forgery techniques not seen during training. In this work, we explore whether this can be done by leveraging a multi-modal, out-of-domain backbone trained in a self-supervised manner, adapted to the video deepfake domain. We propose FakeOut; a novel approach that relies on multi-modal data throughout both the pre-training phase and the adaption phase. We demonstrate the efficacy and robustness of FakeOut in detecting various types of deepfakes, especially manipulations which were not seen during training. Our method achieves state-of-the-art results in cross-dataset generalization on audio-visual datasets. This study shows that, perhaps surprisingly, training on out-of-domain videos (i.e., not especially featuring speaking humans), can lead to better deepfake detection systems. Code is available on GitHub.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Detecting deep-fake videos from phoneme-viseme mismatches. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 660–661, 2020.
  2. Self-Supervised MultiModal Versatile Networks. In NeurIPS, 2020.
  3. Aunet: Learning relations between action units for face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24709–24719, 2023.
  4. Do you really mean that? content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization. arXiv preprint arXiv:2204.06228, 2022.
  5. Domain generalization by solving jigsaw puzzles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2229–2238, 2019.
  6. What makes fake images detectable? understanding properties that generalize. In European conference on computer vision, pages 103–120. Springer, 2020.
  7. Ai singapore trusted media challenge dataset. arXiv preprint arXiv:2201.04788, 2022.
  8. Voice-face homogeneity tells deepfake. arXiv preprint arXiv:2203.02195, 2022.
  9. Not made for each other-audio-visual dissonance-based deepfake detection and localization. In Proceedings of the 28th ACM international conference on multimedia, pages 439–447, 2020.
  10. Voxceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622, 2018.
  11. Lip reading in the wild. In Asian Conference on Computer Vision, 2016.
  12. On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition, pages 5781–5790, 2020.
  13. Deepfakes. https://github.com/deepfakes/faceswap. [Accessed 2022-11-10].
  14. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  15. The deepfake detection challenge (dfdc) dataset. arXiv preprint arXiv:2006.07397, 2020.
  16. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  17. FaceForensics++. https://github.com/ondyari/FaceForensics. [Accessed: 2022-11-10].
  18. Faceswap. https://github.com/MarekKowalski/FaceSwap. [Accessed: 2022-11-10].
  19. Deepfakeucl: Deepfake detection via unsupervised contrastive learning. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
  20. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 776–780. IEEE, 2017.
  21. Spatiotemporal inconsistency learning for deepfake video detection. In Proceedings of the 29th ACM International Conference on Multimedia, pages 3473–3481, 2021.
  22. Leveraging real talking faces via self-supervision for robust forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14950–14962, 2022.
  23. Lips don’t lie: A generalisable and robust approach to face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5039–5049, 2021.
  24. Deepfake detection scheme based on vision transformer and distillation. arXiv preprint arXiv:2104.01353, 2021.
  25. Transfer learning from speaker verification to multispeaker text-to-speech synthesis. Advances in neural information processing systems, 31, 2018.
  26. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2889–2898, 2020.
  27. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
  28. Evaluation of an audio-video multimodal deepfake dataset using unimodal and multimodal detectors. In Proceedings of the 1st workshop on synthetic multimedia-audiovisual deepfake generation and detection, pages 7–15, 2021.
  29. Fakeavceleb: A novel audio-video multimodal deepfake dataset. arXiv preprint arXiv:2108.05080, 2021.
  30. Revisiting self-supervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1920–1929, 2019.
  31. Deepfakes: a new threat to face recognition? assessment and detection. arXiv preprint arXiv:1812.08685, 2018.
  32. Advancing high fidelity identity swapping for forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5074–5083, 2020.
  33. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5001–5010, 2020.
  34. Eac-net: A region-based deep enhancing and cropping approach for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pages 103–110. IEEE, 2017.
  35. Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656, 2018.
  36. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3207–3216, 2020.
  37. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7083–7093, 2019.
  38. Two-branch recurrent network for isolating deepfakes in videos. In European conference on computer vision, pages 667–684. Springer, 2020.
  39. Exploiting visual artifacts to expose deepfakes and face manipulations. In 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 83–92. IEEE, 2019.
  40. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  41. End-to-end learning of visual representations from uncurated instructional videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9879–9889, 2020.
  42. Howto100m: Learning a text-video embedding by watching hundred million narrated video clips. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2630–2640, 2019.
  43. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  44. Emotions don’t lie: An audio-visual deepfake detection method using affective cues. In Proceedings of the 28th ACM international conference on multimedia, pages 2823–2832, 2020.
  45. Multi-task learning for detecting and segmenting manipulated facial images and videos. In 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), pages 1–8. IEEE, 2019.
  46. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7184–7193, 2019.
  47. A lip sync expert is all you need for speech to lip generation in the wild. In Proceedings of the 28th ACM International Conference on Multimedia, pages 484–492, 2020.
  48. Deepfake videos in the wild: Analysis and detection. In Proceedings of the Web Conference 2021, pages 981–992, 2021.
  49. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  50. Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179, 2018.
  51. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1–11, 2019.
  52. Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI), 3(1):80–87, 2019.
  53. Learning audio-visual speech representation by masked multimodal cluster prediction. arXiv preprint arXiv:2201.02184, 2022.
  54. Media forensics considerations on deepfake detection with hand-crafted features. Journal of Imaging, 7(7):108, 2021.
  55. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG), 38(4):1–12, 2019.
  56. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2387–2395, 2016.
  57. Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64–73, 2016.
  58. Actionclip: A new paradigm for video action recognition. arXiv preprint arXiv:2109.08472, 2021.
  59. Cnn-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704, 2020.
  60. Learning self-consistency for deepfake detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 15023–15033, 2021.
  61. Exploring temporal coherence for more general video face forgery detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15044–15054, 2021.
  62. Joint audio-visual deepfake detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14800–14809, 2021.
  63. Face forgery detection by 3d decomposition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2929–2939, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.