SARE: Semantic-Aware Reconstruction Error for Generalizable Diffusion-Generated Image Detection (2508.09487v1)

Published 13 Aug 2025 in cs.CV

Abstract: Recently, diffusion-generated image detection has gained increasing attention, as the rapid advancement of diffusion models has raised serious concerns about their potential misuse. While existing detection methods have achieved promising results, their performance often degrades significantly when facing fake images from unseen, out-of-distribution (OOD) generative models, since they primarily rely on model-specific artifacts. To address this limitation, we explore a fundamental property commonly observed in fake images. Motivated by the observation that fake images tend to exhibit higher similarity to their captions than real images, we propose a novel representation, namely Semantic-Aware Reconstruction Error (SARE), that measures the semantic difference between an image and its caption-guided reconstruction. The hypothesis behind SARE is that real images, whose captions often fail to fully capture their complex visual content, may undergo noticeable semantic shifts during the caption-guided reconstruction process. In contrast, fake images, which closely align with their captions, show minimal semantic changes. By quantifying these semantic shifts, SARE can be utilized as a discriminative feature for robust detection across diverse generative models. We empirically demonstrate that the proposed method exhibits strong generalization, outperforming existing baselines on benchmarks including GenImage and CommunityForensics.