Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can GPT tell us why these images are synthesized? Empowering Multimodal Large Language Models for Forensics (2504.11686v1)

Published 16 Apr 2025 in cs.CV and cs.AI

Abstract: The rapid development of generative AI facilitates content creation and makes image manipulation easier and more difficult to detect. While multimodal LLMs have encoded rich world knowledge, they are not inherently tailored for combating AI-generated Content (AIGC) and struggle to comprehend local forgery details. In this work, we investigate the application of multimodal LLMs in forgery detection. We propose a framework capable of evaluating image authenticity, localizing tampered regions, providing evidence, and tracing generation methods based on semantic tampering clues. Our method demonstrates that the potential of LLMs in forgery analysis can be effectively unlocked through meticulous prompt engineering and the application of few-shot learning techniques. We conduct qualitative and quantitative experiments and show that GPT4V can achieve an accuracy of 92.1% in Autosplice and 86.3% in LaMa, which is competitive with state-of-the-art AIGC detection methods. We further discuss the limitations of multimodal LLMs in such tasks and propose potential improvements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yiran He (9 papers)
  2. Yun Cao (21 papers)
  3. Bowen Yang (55 papers)
  4. Zeyu Zhang (143 papers)