Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection (2403.10261v2)

Published 15 Mar 2024 in cs.CV

Abstract: The deepfake threats to society and cybersecurity have provoked significant public apprehension, driving intensified efforts within the realm of deepfake video detection. Current video-level methods are mostly based on {3D CNNs} resulting in high computational demands, although have achieved good performance. This paper introduces an elegantly simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. This transformation process involves sequentially masking frames at the same positions within each frame. These frames are then resized into sub-frames and reorganized into the predetermined layout, forming thumbnails. TALL is model-agnostic and has remarkable simplicity, necessitating only minimal code modifications. Furthermore, we introduce a graph reasoning block (GRB) and semantic consistency (SC) loss to strengthen TALL, culminating in TALL++. GRB enhances interactions between different semantic regions to capture semantic-level inconsistency clues. The semantic consistency loss imposes consistency constraints on semantic features to improve model generalization ability. Extensive experiments on intra-dataset, cross-dataset, diffusion-generated image detection, and deepfake generation method recognition show that TALL++ achieves results surpassing or comparable to the state-of-the-art methods, demonstrating the effectiveness of our approaches for various deepfake detection problems. The code is available at https://github.com/rainy-xu/TALL4Deepfake.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (100)
  1. Generative adversarial networks. In Proc. NeurIPS, 2014.
  2. A style-based generator architecture for generative adversarial networks. In Proc. CVPR, pages 4401–4410, 2019.
  3. Kodf: A large-scale korean deepfake detection dataset. In Proc. ICCV, pages 10744–10753, 2021.
  4. Depth-aware generative adversarial network for talking head video generation. In Proc. CVPR, 2022.
  5. Luisa Verdoliva. Media forensics and deepfakes: an overview. IEEE Journal of Selected Topics in Signal Processing, 14(5):910–932, 2020.
  6. The creation and detection of deepfakes: A survey. ACM CSUR, 54(1):1–41, 2021.
  7. Mesonet: a compact facial video forgery detection network. In Proc. WIFS, pages 1–7, 2018.
  8. Face x-ray for more general face forgery detection. In Proc. CVPR, pages 5001–5010, 2020.
  9. Two-branch recurrent network for isolating deepfakes in videos. In Proc. ECCV, pages 667–684, 2020.
  10. Dfdt: An end-to-end deepfake detection framework using vision transformer. Applied Sciences, 12(6):2953, 2022.
  11. Swin transformer: Hierarchical vision transformer using shifted windows. In Proc. ICCV, pages 10012–10022, 2021.
  12. Tall: Thumbnail layout for deepfake video detection. In Proc. ICCV, pages 22658–22668, 2023.
  13. Wilddeepfake: A challenging real-world dataset for deepfake detection. In Proc. ACM MM, pages 2382–2390, 2020.
  14. Towards the detection of diffusion model deepfakes. arXiv preprint arXiv:2210.14571, 2022.
  15. In ictu oculi: Exposing AI generated fake face videos by detecting eye blinking. In IEEE WIFS, 2018.
  16. Exposing deep fakes using inconsistent head poses. In Proc. ICASSP, 2019.
  17. Inconsistency-aware wavelet dual-branch network for face forgery detection. IEEE T-BIOM, 3(3), 2021.
  18. Learning second order local anomaly for general face forgery detection. In Proc. CVPR, 2022.
  19. Robust image forgery detection over online social network shared images. In Proc. CVPR, 2022.
  20. Deepfake detection based on discrepancies between faces and their context. IEEE TPAMI, 2021.
  21. Visual-semantic transformer for face forgery detection. In Proc. IJCB, pages 1–7, 2021.
  22. Masked relation learning for deepfake detection. IEEE TIFS, 18:1696–1708, 2023.
  23. Leveraging frequency analysis for deep fake image recognition. In Proc. ICML, 2020.
  24. Think twice before detecting gan-generated fake images from their spectral domain imprints. In Proc. CVPR, 2022.
  25. Add: Frequency attention and multi-view based knowledge distillation to detect low-quality compressed deepfake images. In Proc. AAAI, 2022.
  26. Global texture enhancement for fake face detection in the wild. In Proc. CVPR, 2020.
  27. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proc. ECCV, 2020.
  28. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In Proc. CVPR, 2021.
  29. Representative forgery mining for fake face detection. In Proc. CVPR, pages 14923–14932, 2021.
  30. Multi-attentional deepfake detection. In Proc. CVPR, pages 2185–2194, 2021.
  31. Learning self-consistency for deepfake detection. In Proc. CVPR, pages 15023–15033, 2021.
  32. Protecting celebrities from deepfake with identity consistency transformer. In Proc. CVPR, pages 9468–9478, 2022.
  33. M2tr: Multi-modal multi-scale transformers for deepfake detection. In Proc. ICMR, pages 615–623, 2022.
  34. Core: Consistent representation learning for face forgery detection. In Proc. CVPR, pages 12–21, 2022.
  35. Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. In Proc. CVPR, pages 18710–18719, 2022.
  36. Detecting deepfakes with self-blended images. In Proc. CVPR, pages 18720–18729, 2022.
  37. Detecting real-time deep-fake videos using active illumination. In Proc. CVPRW, pages 53–60, 2022.
  38. Joint audio-visual deepfake detection. In Proc. ICCV, pages 14800–14809, 2021.
  39. Deepfake video detection through optical flow based cnn. In Proc. ICCV, 2019.
  40. Finfer: Frame inference-based deepfake detection for high-visual-quality videos. In Proc. AAAI, pages 951–959, 2022.
  41. Exploring temporal coherence for more general video face forgery detection. In Proc. ICCV, pages 15044–15054, 2021.
  42. Lips don’t lie: A generalisable and robust approach to face forgery detection. In Proc. CVPR, pages 5039–5049, 2021.
  43. Id-reveal: Identity-aware deepfake video detection. In Proc. ICCV, pages 15108–15117, 2021.
  44. Leveraging real talking faces via self-supervision for robust forgery detection. In Proc. CVPR, pages 14950–14962, 2022.
  45. Protecting world leaders against deep fakes. In Proc. CVPRW, page 38, 2019.
  46. Spatiotemporal inconsistency learning for deepfake video detection. In Proc. ACMMM, pages 3473–3481, 2021.
  47. Delving into the local: Dynamic inconsistency learning for deepfake video detection. In Proc. AAAI, 2022.
  48. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. ICLR, 2021.
  49. Deepfake video detection using convolutional vision transformer. arXiv:2102.11126, 2021.
  50. Deepfake detection scheme based on vision transformer and distillation. arXiv:2104.01353, 2021.
  51. Cross-forgery analysis of vision transformers and cnns for deepfake image detection. In Proc. ICMRW, pages 52–58, 2022.
  52. Combining efficientnet and vision transformers for video deepfake detection. In Proc. ICIAP, 2022.
  53. Dual contrastive learning for general face forgery detection. In Proc. AAAI, pages 2316–2324, 2022.
  54. Responsible disclosure of generative models using scalable fingerprinting. In Proc. ICLR, 2022.
  55. Improving generalization by commonality learning in face forgery detection. IEEE TIFS, 17:547–558, 2022.
  56. Self-supervised transformer for deepfake detection. arXiv:2203.01265, 2022.
  57. Video transformer for deepfake detection with incremental learning. In Proc. ACM MM, pages 1821–1828, 2021.
  58. Faketransformer: Exposing face forgery from spatial-temporal representation modeled by facial pixel variations. In Proc. ICSP, pages 705–713, 2022.
  59. Bita-net: Bi-temporal attention network for facial video forgery detection. In Proc. IJCB, pages 1–8, 2021.
  60. Slowfast networks for video recognition. In Proc. ICCV, pages 6202–6211, 2019.
  61. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  62. Videos as space-time region graphs. In Proc. ECCV, pages 399–417, 2018.
  63. Non-local neural networks. In Proc. CVPR, pages 7794–7803, 2018.
  64. Symbolic graph reasoning meets convolutions. Proc. NeurIPS, 2018.
  65. Graph-based global reasoning networks. In Proc. CVPR, pages 433–442, 2019.
  66. Panoptic scene graph generation. In Proc. ECCV, pages 178–196, 2022.
  67. End-to-end reconstruction-classification learning for face forgery detection. In Proc. CVPR, 2022.
  68. The representation and recognition of human movement using temporal templates. In Proc. CVPR, pages 928–934, 1997.
  69. Bangpeng Yao and Li Fei-Fei. Action recognition with exemplar based 2.5 d graph matching. In Proc. ECCV, pages 173–186, 2012.
  70. Dynamic image networks for action recognition. In Proc. CVPR, pages 3034–3042, 2016.
  71. Still image action recognition by predicting spatial-temporal pixel evolution. In Proc. WACV, pages 111–120, 2019.
  72. Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552, 2017.
  73. Faceforensics++: Learning to detect manipulated facial images. In Proc. ICCV, pages 1–11, 2019.
  74. deepfakes. Deepfakes. https://github.com/deepfakes/faceswap, 2020.
  75. MarekKowalski. Faceswap. https://github.com/MarekKowalski/FaceSwap/, 2021.
  76. Face2face: Real-time face capture and reenactment of rgb videos. In Proc. CVPR, pages 2387–2395, 2016.
  77. Deferred neural rendering: Image synthesis using neural textures. ACM TOG, 38(4):1–12, 2019.
  78. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proc. CVPR, pages 3207–3216, 2020.
  79. The deepfake detection challenge (dfdc) dataset. arXiv:2006.07397, 2020.
  80. Advancing high fidelity identity swapping for forgery detection. In Proc. CVPR, pages 5074–5083, 2020.
  81. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In Proc. CVPR, pages 2889–2898, 2020.
  82. François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proc. CVPR, pages 1251–1258, 2017.
  83. Adt: Anti-deepfake transformer. In Proc. ICASSP, pages 2899–1903, 2022.
  84. Video transformer network. In Proc. ICCV, pages 3163–3172, 2021.
  85. Vidtr: Video transformer without convolutions. In Proc. ICCV, pages 13577–13587, 2021.
  86. Vivit: A video vision transformer. In Proc. ICCV, pages 6836–6846, 2021.
  87. Istvt: Interpretable spatial-temporal video transformer for deepfake detection. IEEE TIFS, 18:1335–1348, 2023.
  88. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters, 23(10):1499–1503, 2016.
  89. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
  90. Teinet: Towards an efficient architecture for video recognition. In Proc.AAAI, pages 11669–11676, 2020.
  91. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proc. ICCV, pages 618–626, 2017.
  92. Cnn-generated images are surprisingly easy to spot… for now. In Proc. CVPR, pages 8695–8704, 2020.
  93. Recurrent convolutional strategies for face manipulation detection in videos. In Proc. CVPRW, pages 80–87, 2019.
  94. What makes fake images detectable? understanding properties that generalize. In Proc. ECCV, pages 103–120, 2020.
  95. Exposing deepfake videos by detecting face warping artifacts. In Proc. CVPRW, pages 656–663, 2019.
  96. Core: Consistent representation learning for face forgery detection. In Proc. CVPRW, pages 12–21, 2022.
  97. Quo vadis, action recognition? a new model and the kinetics dataset. In Proc. CVPR, pages 6299–6308, 2017.
  98. Learning spatio-temporal features with 3d residual networks for action recognition. In Proc. ICCV Workshops, pages 3154–3160, 2017.
  99. Deep residual learning for image recognition. In Proc. CVPR, pages 770–778, 2016.
  100. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proc. ICML, pages 6105–6114, 2019.
Citations (3)

Summary

We haven't generated a summary for this paper yet.