Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 40 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 161 tok/s Pro
2000 character limit reached

Zero-Shot Detection of AI-Generated Images (2409.15875v1)

Published 24 Sep 2024 in cs.CV

Abstract: Detecting AI-generated images has become an extraordinarily difficult challenge as new generative architectures emerge on a daily basis with more and more capabilities and unprecedented realism. New versions of many commercial tools, such as DALLE, Midjourney, and Stable Diffusion, have been released recently, and it is impractical to continually update and retrain supervised forensic detectors to handle such a large variety of models. To address this challenge, we propose a zero-shot entropy-based detector (ZED) that neither needs AI-generated training data nor relies on knowledge of generative architectures to artificially synthesize their artifacts. Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images. To this end, we rely on a lossless image encoder that estimates the probability distribution of each pixel given its context. To ensure computational efficiency, the encoder has a multi-resolution architecture and contexts comprise mostly pixels of the lower-resolution version of the image.Since only real images are needed to learn the model, the detector is independent of generator architectures and synthetic training data. Using a single discriminative feature, the proposed detector achieves state-of-the-art performance. On a wide variety of generative models it achieves an average improvement of more than 3% over the SoTA in terms of accuracy. Code is available at https://grip-unina.github.io/ZED/.

Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents a zero-shot method named ZED that uses real image models and entropy measures to detect synthetic images.
  • It utilizes a state-of-the-art lossless image encoder to assess pixel likelihoods at multiple resolutions, achieving AUCs above 95%.
  • The approach eliminates the need for retraining on synthetic data, offering robust applications in digital forensics and media authentication.

Zero-Shot Detection of AI-Generated Images

The paper "Zero-Shot Detection of AI-Generated Images" by Cozzolino et al. addresses the challenge of distinguishing AI-generated images from real ones without relying on the constant retraining of supervised models. The proposed method, named Zero-Shot Entropy-based Detector (ZED), innovatively leverages an intrinsic model of real images, learned through a lossless image encoder, to achieve state-of-the-art detection performance.

Motivation and Context

The rapid advancement in generative models, such as GANs and diffusion models, poses a significant challenge for existing AI-detection methods that are predominantly supervised. Generative models like DALL·E, Midjourney, and Stable Diffusion are frequently updated, pushing the boundaries of realism, thus making supervised detectors increasingly impractical due to the need for continuous retraining on new synthetic data. This paper proposes a zero-shot approach based on the concept of entropy to circumvent this issue.

Methodology

The core idea behind ZED is to measure how "surprising" an image is when compared to a model derived solely from real images. The surprise or anomaly detection is facilitated by using a state-of-the-art lossless image encoder, which estimates the probability distribution of each pixel in an image based on its context. The paper primarily uses the Super-Resolution based lossless Compressor (SReC) by Cao et al. as the encoder for this purpose.

Here's a breakdown of the methodology:

  1. Model of Real Images: The encoder is trained exclusively on real images, thus capturing intrinsic statistics of real images.
  2. Multi-Resolution Architecture: The encoder evaluates the likelihood of pixel values at multiple resolutions. This multi-scale approach ensures computational efficiency.
  3. Surprise Measure: By comparing the actual coding cost of an image (Negative Log Likelihood - NLL) against its expected value (entropy), the method identifies discrepancies—higher discrepancies indicate synthetic images.

Numerical Results

The proposed detector achieves state-of-the-art performance with an average improvement of over 3% in terms of accuracy compared to existing methods. The method's robustness is further confirmed through testing on a variety of generative models, demonstrating consistency across different types of synthetic imagery.

Key quantitative results include:

  • AUC (Area Under the ROC Curve): The paper reported significant performance improvements, reaching AUC values consistently above 95% for several popular generative models like DALL·E, Midjourney, and SDXL.
  • Decision Statistics: The use of coding cost gaps (D^{(0)} and its derivatives) as decision statistics proves effective, providing reliable indications of image authenticity.

Implications

The implications of this research are multifaceted:

  1. Practical Impact: ZED provides a practical solution for AI-generated content detection in various applications such as digital forensics, media authentication, and social media monitoring. The method's independence from synthetic training data represents a paradigm shift, ensuring robustness against newly emerging generative models.
  2. Theoretical Advancement: The approach underscores the utility of entropy and information-theoretic measures in the domain of image forensics. By leveraging the inherent properties of real images through a lossless encoder, the paper introduces a novel perspective on zero-shot learning.
  3. Future Developments: This research paves the way for further exploration into zero-shot learning techniques within the broader AI detection landscape. The reliance on entropy-based measures could be extended to other forms of media, including video and audio, broadening the scope of forensic tools available for digital content verification.

Conclusion

The paper "Zero-Shot Detection of AI-Generated Images" makes a significant contribution to the field of AI-image forensics by proposing a robust and scalable zero-shot detector. The method's reliance on an intrinsic model of real images, encoded through a lossless image compression model, ensures adaptability and high performance in the face of rapidly evolving generative models. This work sets a benchmark for future research, emphasizing the importance of entropy measures and zero-shot learning in maintaining the integrity of visual media.

Future Work

While ZED demonstrates impressive results, future work can focus on enhancing the robustness of the method to various forms of image degradation and manipulations often encountered in real-world scenarios. Extended experimentation with other types of discrete data encoders and exploring joint decision statistics might further optimize performance and generalizability.

Overall, ZED represents a critical advancement in automated detection systems, providing a foundation that both academic researchers and industry practitioners can build upon to develop more resilient AI-generated content detection frameworks.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Github Logo Streamline Icon: https://streamlinehq.com