Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection (2306.06881v2)

Published 12 Jun 2023 in cs.CV

Abstract: We present a novel approach for the detection of deepfake videos using a pair of vision transformers pre-trained by a self-supervised masked autoencoding setup. Our method consists of two distinct components, one of which focuses on learning spatial information from individual RGB frames of the video, while the other learns temporal consistency information from optical flow fields generated from consecutive frames. Unlike most approaches where pre-training is performed on a generic large corpus of images, we show that by pre-training on smaller face-related datasets, namely Celeb-A (for the spatial learning component) and YouTube Faces (for the temporal learning component), strong results can be obtained. We perform various experiments to evaluate the performance of our method on commonly used datasets namely FaceForensics++ (Low Quality and High Quality, along with a new highly compressed version named Very Low Quality) and Celeb-DFv2 datasets. Our experiments show that our method sets a new state-of-the-art on FaceForensics++ (LQ, HQ, and VLQ), and obtains competitive results on Celeb-DFv2. Moreover, our method outperforms other methods in the area in a cross-dataset setup where we fine-tune our model on FaceForensics++ and test on CelebDFv2, pointing to its strong cross-dataset generalization ability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Mesonet: a compact facial video forgery detection network. IEEE International Workshop on Information Forensics and Security, pages 1–7, 2018.
  2. Deepfake video detection through optical flow based cnn. IEEE/CVF International Conference on Computer Vision Workshop, pages 1205–1207, 2019.
  3. MultiMAE: Multi-modal multi-task masked autoencoders. European Conference on Computer Vision, 2022.
  4. A deep learning approach to universal image manipulation detection using a new convolutional layer. ACM Workshop on Information Hiding and Multimedia Security, 2016.
  5. Do you really mean that? content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization. International Conference on Digital Image Computing: Techniques and Applications, pages 1–10, 2022.
  6. Optical flow based cnn for detection of unlearnt deepfake manipulations. Pattern Recognition Letters, 146:31–37, 2021.
  7. End-to-end reconstruction-classification learning for face forgery detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4113–4122, 2022.
  8. Self-supervised learning of adversarial examples: Towards good generalizations for deepfake detections. IEEE/CVF conference on computer vision and pattern recognition, 2022.
  9. Local relation learning for face forgery detection. AAAI Conference on Artificial Intelligence, pages 1081–1088, 2021.
  10. F. Chollet. Xception: Deep learning with depthwise separable convolutions. IEEE Conference on Computer Vision and Pattern Recognition, pages 1800–1807, 2017.
  11. Fakecatcher: Detection of synthetic portrait videos using biological signals. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2020.
  12. MMFlow Contributors. MMFlow: Openmmlab optical flow toolbox and benchmark. https://github.com/open-mmlab/mmflow, 2021.
  13. Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. ACM Workshop on Information Hiding and Multimedia Security, 2017.
  14. Deepfakes. Faceswap: Deepfakes Software. https://github.com/deepfakes/faceswap/, accessed 2023.
  15. Imagenet: A large-scale hierarchical image database. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  16. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations, 2021.
  17. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security, 7:868–882, 2012.
  18. Deepfakeucl: Deepfake detection via unsupervised contrastive learning. International Joint Conference on Neural Networks, pages 1–8, 2021.
  19. Vixnet: Vision transformer with xception network for deepfakes based video and image forgery detection. Expert Systems with Applications, 2022.
  20. Lips don’t lie: A generalisable and robust approach to face forgery detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5039–5049, 2021.
  21. Masked autoencoders are scalable vision learners. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15979–15988, 2022.
  22. Deepfake detection algorithm based on improved vision transformer. Applied Intelligence, 53(7):7512–7527, 2023.
  23. Mover: Mask and recovery based facial part consistency aware method for deepfake video detection. arXiv preprint arXiv: 2303.01740, 2023.
  24. Masked auto-encoding spectral–spatial transformer for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, pages 1–14, 2022.
  25. Avfakenet: A unified end-to-end dense swin transformer deep learning model for audio-visual deepfakes detection. Applied Soft Computing, 136:110124, 2023.
  26. Optical flow-attention fusion model for deepfake detection. International Conference on Algorithms, Computing and Artificial Intelligence, 2023.
  27. Fakeavceleb: A novel audio-video multimodal deepfake dataset. Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2021.
  28. Semi-supervised learning with deep generative models. Advances in neural information processing systems, 27, 2014.
  29. Fakeout: Leveraging out-of-domain self-supervision for multi-modal video deepfake detection. arXiv preprint arXiv:2212.00773, 2022.
  30. Marek Kowalski. FaceSwap: Deep Learning for Face Swapping. https://github.com/MarekKowalski/FaceSwap, accessed 2023.
  31. Face x-ray for more general face forgery detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5000–5009, 2020.
  32. Celeb-df: A large-scale challenging dataset for deepfake forensics. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3204–3213, 2020.
  33. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. IEEE/CVF conference on computer vision and pattern recognition, pages 772–781, 2021.
  34. Deep learning face attributes in the wild. International Conference on Computer Vision, 2015.
  35. Two-branch Recurrent Network for Isolating Deepfakes in Videos. European Conference on Computer Vision, 2020.
  36. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  37. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, pages 8024–8035, 2019.
  38. Thinking in frequency: Face forgery detection by mining frequency-aware clues. European Conference on Computer Vision, 2020.
  39. Learning transferable visual models from natural language supervision. International Conference on Machine Learning, pages 8748–8763, 2021.
  40. Distinguishing computer graphics from natural images using convolution neural networks. IEEE Workshop on Information Forensics and Security (WIFS), pages 1–6, 2017.
  41. Deepfake detection: A systematic literature review. IEEE Access, 2022.
  42. Stochastic backpropagation and approximate inference in deep generative models. International Conference on Machine Learning, 32:1278–1286, 2014.
  43. High-resolution image synthesis with latent diffusion models. IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  44. FaceForensics++: Learning to detect manipulated facial images. International Conference on Computer Vision, 2019.
  45. Faceforensics++: Learning to detect manipulated facial images. IEEE/CVF International Conference on Computer Vision, pages 1–11, 2019.
  46. A hybrid cnn-lstm model for video deepfake detection by leveraging optical flow features. 2022 International Joint Conference on Neural Networks, pages 1–7, 2022.
  47. Grad-cam: Visual explanations from deep networks via gradient-based localization. IEEE International Conference on Computer Vision, pages 618–626, 2017.
  48. Real face foundation representation learning for generalized deepfake detection. arXiv preprint arXiv: 2303.08439, 2023.
  49. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
  50. An information theoretic approach for attention-driven face forgery detection. European Conference on Computer Vision, pages 111–127, 2022.
  51. Dual contrastive learning for general face forgery detection. AAAI Conference on Artificial Intelligence, 36:2316–2324, 2022.
  52. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (ToG), 2019.
  53. Face2face: Real-time face capture and reenactment of rgb videos. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2387–2395, 2016.
  54. Leveraging deep learning approaches for deepfake detection: A review. International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence, page 12–19, 2023.
  55. Deepfakes evolution: Analysis of facial regions and fake detection performance. International Conference on Pattern Recognition, 2021.
  56. Suramya Tomar. Converting video formats with ffmpeg. Linux Journal, 2006(146):10, 2006.
  57. Deepfake video detection using convolutional vision transformer. arXiv preprint arXiv: 2102.11126, 2021.
  58. Face recognition in unconstrained videos with matched background similarity. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 529–534, 2011.
  59. An overview of deep generative models. IETE Technical Review, 32(2):131–139, 2015.
  60. Supervised contrastive learning for generalizable and explainable deepfakes detection. IEEE/CVF Winter Conference on Applications of Computer Vision, pages 379–389, 2022.
  61. A survey on deepfake video detection. IET Biometrics, 10(6):607–624, 2021.
  62. Cutmix: Regularization strategy to train strong classifiers with localizable features. IEEE/CVF International Conference on Computer Vision, pages 6023–6032, 2019.
  63. mixup: Beyond empirical risk minimization. International Conference On Learning Representations, 2017.
  64. Deepfake videos detection using self-supervised decoupling network. IEEE International Conference on Multimedia and Expo, pages 1–6, 2021.
  65. Multi-attentional deepfake detection. IEEE/CVF conference on computer vision and pattern recognition, pages 2185–2194, 2021.
  66. Self-supervised transformer for deepfake detection. arXiv preprint arXiv:2203.01265, 2022.
  67. Uia-vit: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection. European Conference on Computer Vision, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.