Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics (2403.14077v4)

Published 21 Mar 2024 in cs.AI and cs.CR
Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Abstract: DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. Detecting DeepFakes is currently solved with programmed machine learning algorithms. In this work, we investigate the capabilities of multimodal LLMs in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrate multimodal LLMs and show that they can expose AI-generated images through careful experimental design and prompt engineering. This is interesting, considering that LLMs are not inherently tailored for media forensic tasks, and the process does not require programming. We discuss the limitations of multimodal LLMs for these tasks and suggest possible improvements.

Evaluating ChatGPT for DeepFake Detection in Media Forensics

The paper, "Can ChatGPT Detect DeepFakes? A Study of Using Multimodal LLMs for Media Forensics," explores the potential of employing multimodal LLMs for the detection of DeepFakes, a prevalent concern in the digital era. This work critically examines whether the capabilities of LLMs, specifically those that can process both textual and visual inputs, could be harnessed for media forensics without relying on complex programming.

Methodology and Experimental Design

The researchers focus on ChatGPT, particularly its iterations that can handle multimodal data, such as images and text. The paper opts for identifying AI-generated face images, which are among the earliest and most notorious forms of DeepFakes produced through models like GANs and diffusion models. The authors systematically design experiments leveraging the GPT4V model, assessing its ability to determine whether a given image is AI-generated using various types of prompts.

A significant aspect of the paper is the exploration of different prompts, ranging from simple binary queries to more context-rich ones. The paper highlights that simpler prompts often lead to higher rejection rates by the LLMs, thereby necessitating more intricate prompt engineering for effective DeepFake detection. This approach suggests that the inherent capabilities of LLMs, which are not originally tailored for media forensics, can still be deployed effectively with carefully designed user-text interfaces.

Key Findings and Performance Metrics

The paper reports an AUC score of approximately 75% when using multimodal LLMs for detecting AI-generated face images. This AUC score indicates the LLMs' discernment capabilities, albeit limited compared to advanced programmed methodologies. While the model shows satisfactory performance in detecting AI-generated images, its accuracy drops significantly when identifying real images, owing to the absence of explicit semantic inconsistencies typically associated with AI-generated content.

The rejection rate of the LLM's responses plays a critical role in evaluating its practical utility. The paper emphasizes that effective prompt engineering can notably reduce rejection rates and improve the accuracy of the model's predictions. Through iterative querying and context-providing prompts, the efficacy of LLMs can be enhanced to discern between authentic and AI-generated content more reliably.

Implications and Speculative Future Directions

This research demonstrates the prospect of integrating LLMs into media forensics, offering a user-friendly, intuitive interface without requiring deep programming expertise. The paper underscores that while LLMs can be a part of the media forensics toolkit, they currently lag behind traditional methods in accuracy. This shortfall is partly due to LLMs' reliance on semantic reasoning rather than signal-level details, which are more targeted in current DeepFake detection tools.

The paper speculates on the future development of LLMs in this domain, suggesting enhancements such as improved prompting strategies, incorporation of feedback mechanisms, and hybrid integration with established signal-processing techniques. These developments would potentially increase the LLMs’ utility in comprehensive media forensic tasks, including video analysis and the identification of text-image mis-contextualization.

Conclusion

In conclusion, the paper provides a thorough examination of ChatGPT's abilities to detect DeepFakes, contributing valuable insights to the field of media forensics. While current performance may not compete with specialized detection systems, the promise of LLMs lies in their ease of accessibility and the significant potential to evolve. This paper encourages further exploration toward utilizing multimodal LLMs, enhancing their precision, and integrating them effectively with existing forensic methodologies for more robust media analysis solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Experts: Spy used ai-generated face to connect with targets. https://www.theverge.com/2019/6/13/18677341/ai-generated-fake-faces-spy-linked-in-contacts-associated-press.
  2. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  3. End-to-end reconstruction-classification learning for face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4113–4122, 2022.
  4. How Do the Hearts of Deep Fakes Beat? Deep Fake Source Detection via Interpreting Residuals with Biological Signals. In IEEE/IAPR International Joint Conference on Biometrics (IJCB), 2020.
  5. CNN. A high school student created a fake 2020 US candidate. twitter verified it. https://www.cnn.com/2020/02/28/tech/fake-twitter-candidate-2020/index.html, a.
  6. CNN. How fake faces are being weaponized online. https://www.cnn.com/2020/02/20/tech/fake-faces-deepfake/index.html, b.
  7. On the detection of synthetic images generated by diffusion models. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  8. How good is chatgpt at face biometrics? a first look into recognition, soft biometrics, and explainability, 2024.
  9. Explaining deepfake detection by analysing image matching. In European Conference on Computer Vision, pages 18–35. Springer, 2022.
  10. Leveraging frequency analysis for deep fake image recognition. arXiv preprint arXiv:2003.08685, 2020.
  11. Are gan generated images easy to detect? a critical analysis of the state-of-the-art. In ICME, pages 1–6. IEEE, 2021.
  12. Fcd-net: Learning to detect multiple types of homologous deepfake face images. IEEE Transactions on Information Forensics and Security, 2023.
  13. Beyond the spectrum: Detecting deepfakes via re-synthesis. In 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021.
  14. Exposing GAN-generated faces using inconsistent corneal specular highlights. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021.
  15. Fusing global and local features for generalized ai-synthesized image detection. In 2022 IEEE International Conference on Image Processing (ICIP), pages 3465–3469. IEEE, 2022.
  16. Glff: Global and local feature fusion for ai-synthesized image detection. IEEE Transactions on Multimedia, 2023.
  17. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
  18. A style-based generator architecture for generative adversarial networks. In CVPR, 2019.
  19. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8110–8119, 2020.
  20. Face x-ray for more general face forgery detection. In CVPR, 2020a.
  21. Exposing deepfake videos by detecting face warping artifacts. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019.
  22. In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking. In IEEE Workshop on Information Forensics and Security (WIFS), Hong Kong, 2018.
  23. Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics. In IEEE Conference on Computer Vision and Patten Recognition (CVPR), Seattle, WA, United States, 2020b.
  24. Exploiting visual artifacts to expose deepfakes and face manipulations. In 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 83–92, 2019.
  25. Detecting GAN-generated imagery using color cues. arXiv preprint arXiv:1812.08247, 2018.
  26. Capsule-forensics: Using capsule networks to detect forged images and videos. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2307–2311. IEEE, 2019.
  27. Reuters. These faces are not real. https://graphics.reuters.com/CYBER-DEEPFAKE/ACTIVIST/nmovajgnxpa/index.html.
  28. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  29. FaceForensics++: Learning to detect manipulated facial images. In ICCV, 2019.
  30. Chatgpt for digital forensic investigation: The good, the bad, and the unknown. Forensic Science International: Digital Investigation, 46:301609, 2023.
  31. Shield: An evaluation benchmark for face spoofing and forgery detection with multimodal large language models. arXiv preprint arXiv:2402.04178, 2024.
  32. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  33. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  34. Cnn-generated images are surprisingly easy to spot… for now. arXiv: Computer Vision and Pattern Recognition, 2019.
  35. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  36. Wavelet-packets for deepfake image analysis and detection. Machine Learning, 111(11):4295–4327, 2022.
  37. Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis. arXiv preprint arXiv:2310.09909, 2023.
  38. Supervised contrastive learning for generalizable and explainable deepfakes detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 379–389, 2022.
  39. Exposing GAN-synthesized faces using landmark locations. In International Workshop on Information Hiding and Multimedia Security, Paris, France, 2019.
  40. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 9(1):1, 2023.
  41. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Shan Jia (26 papers)
  2. Reilin Lyu (1 paper)
  3. Kangran Zhao (1 paper)
  4. Yize Chen (57 papers)
  5. Zhiyuan Yan (81 papers)
  6. Yan Ju (10 papers)
  7. Chuanbo Hu (16 papers)
  8. Xin Li (980 papers)
  9. Baoyuan Wu (107 papers)
  10. Siwei Lyu (125 papers)
Citations (15)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com