Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AI-Generated Video Detection via Spatio-Temporal Anomaly Learning (2403.16638v1)

Published 25 Mar 2024 in cs.CV and cs.CR
AI-Generated Video Detection via Spatio-Temporal Anomaly Learning

Abstract: The advancement of generation models has led to the emergence of highly realistic AI-generated videos. Malicious users can easily create non-existent videos to spread false information. This letter proposes an effective AI-generated video detection (AIGVDet) scheme by capturing the forensic traces with a two-branch spatio-temporal convolutional neural network (CNN). Specifically, two ResNet sub-detectors are learned separately for identifying the anomalies in spatical and optical flow domains, respectively. Results of such sub-detectors are fused to further enhance the discrimination ability. A large-scale generated video dataset (GVD) is constructed as a benchmark for model training and evaluation. Extensive experimental results verify the high generalization and robustness of our AIGVDet scheme. Code and dataset will be available at https://github.com/multimediaFor/AIGVDet.

The paper "AI-Generated Video Detection via Spatio-Temporal Anomaly Learning" addresses the challenge of detecting AI-generated videos, which have become increasingly realistic due to advancements in generative models. These videos present a potential risk as they can be used to spread misinformation. The authors propose an AI-generated video detection scheme named AIGVDet that identifies forensic traces using a two-branch spatio-temporal convolutional neural network (CNN).

Key Components:

  • Two-Branch CNN Architecture: The detection system employs two separate ResNet-based sub-detectors. One focuses on spatial anomalies, while the other examines optical flow anomalies. Spatial anomalies refer to inconsistencies in the static frames of the video, whereas optical flow anomalies analyze movement and temporal inconsistencies across frames.
  • Fusion of Detection Results: The results from the spatial and optical flow sub-detectors are fused to enhance the system's discrimination capability. This integration leverages the strengths of both branches to improve accuracy and robustness in detecting AI-generated content.
  • Dataset: To train and evaluate their model, the authors constructed a large-scale generated video dataset (GVD). This dataset serves as a benchmark for assessing the model's performance.
  • Experimental Results: Extensive experiments demonstrate that the AIGVDet scheme has high generalization and robustness. This indicates that the system performs well across a variety of scenarios and is not limited to specific types of video manipulations.

Practical Implications:

The development of such a detection system is significant for mitigating the risks associated with the misuse of AI-generated videos. By efficiently identifying these videos, platforms and regulatory bodies can better manage and control the spread of false information. The authors mention that they plan to release both the code and the dataset, which could further encourage research and development in this domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. openai sora, https://openai.com/sora.
  2. The Washington Post, https://www.washingtonpost.com/technology/2023/10/30/biden-artificial-intelligence-executive-order/.
  3. S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn-generated images are surprisingly easy to spot… for now,” in IEEE International Conference on Computer Vision and Pattern Recognition, 2020, pp. 8695–8704.
  4. D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, and L. Verdoliva, “Are gan generated images easy to detect? a critical analysis of the state-of-the-art,” in IEEE International Conference on Multimedia and Expo, 2021, pp. 1–6.
  5. R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, and L. Verdoliva, “On the detection of synthetic images generated by diffusion models,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1–5.
  6. Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li, “Dire for diffusion-generated image detection,” arXiv: 2303.09295.
  7. R. Caldelli, L. Galteri, I. Amerini, and A. Del Bimbo, “Optical flow based cnn for detection of unlearnt deepfake manipulations,” Elsevier Pattern Recognition Letters, vol. 146, pp. 31–37, 2021.
  8. C.-Z. Yang, J. Ma, S. Wang, and A. W.-C. Liew, “Preventing deepfake attacks on speaker authentication by dynamic lip movement analysis,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 1841–1854, 2020.
  9. Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, and L. Ma, “Delving into the local: Dynamic inconsistency learning for deepfake video detection,” in AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 744–752.
  10. Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Springer European Conference on Computer Vision, 2020, pp. 402–419.
  11. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE International Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  12. Discord, https://discord.com/.
  13. Moonvalley, https://moonvalley.ai/.
  14. H. Chen, M. Xia, Y. He, Y. Zhang, X. Cun, S. Yang, J. Xing, Y. Liu, Q. Chen, X. Wang et al., “Videocrafter1: Open diffusion models for high-quality video generation,” arXiv: 2310.19512, 2023.
  15. Pika, https://www.pika.art/.
  16. NeverEnds, https://neverends.life.
  17. R. Girdhar, M. Singh, A. Brown et al., “Emu video: Factorizing text-to-video generation by explicit image conditioning,” arXiv: 2311.10709, 2023.
  18. D. Kondratyuk, L. Yu, X. Gu, J. Lezama, J. Huang, R. Hornung, H. Adam, H. Akbari, Y. Alon, V. Birodkar et al., “Videopoet: A large language model for zero-shot video generation,” arXiv: 2312.14125, 2023.
  19. L. Yang, Y. Fan, and N. Xu, “Video instance segmentation,” in IEEE International Conference on Computer Vision, 2019, pp. 5188–5197.
  20. L. Huang, X. Zhao, and K. Huang, “Got-10k: A large high-diversity benchmark for generic object tracking in the wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562–1577, 2019.
  21. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE Conference on computer Vision and Pattern Recognition, 2009, pp. 248–255.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jianfa Bai (1 paper)
  2. Man Lin (8 papers)
  3. Gang Cao (150 papers)
Citations (1)