Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Noise-Informed Diffusion-Generated Image Detection with Anomaly Attention (2506.16743v1)

Published 20 Jun 2025 in cs.CV

Abstract: With the rapid development of image generation technologies, especially the advancement of Diffusion Models, the quality of synthesized images has significantly improved, raising concerns among researchers about information security. To mitigate the malicious abuse of diffusion models, diffusion-generated image detection has proven to be an effective countermeasure.However, a key challenge for forgery detection is generalising to diffusion models not seen during training. In this paper, we address this problem by focusing on image noise. We observe that images from different diffusion models share similar noise patterns, distinct from genuine images. Building upon this insight, we introduce a novel Noise-Aware Self-Attention (NASA) module that focuses on noise regions to capture anomalous patterns. To implement a SOTA detection model, we incorporate NASA into Swin Transformer, forming an novel detection architecture NASA-Swin. Additionally, we employ a cross-modality fusion embedding to combine RGB and noise images, along with a channel mask strategy to enhance feature learning from both modalities. Extensive experiments demonstrate the effectiveness of our approach in enhancing detection capabilities for diffusion-generated images. When encountering unseen generation methods, our approach achieves the state-of-the-art performance.Our code is available at https://github.com/WeinanGuan/NASA-Swin.

Summary

  • The paper introduces a noise-informed detection paradigm using a novel NASA-Swin architecture to analyze intrinsic noise patterns in diffusion-generated images.
  • Key technical innovations include the Noise-Aware Self-Attention (NASA) module, Cross-Modality Fusion Embedding (CMFE), and Channel Mask Strategy (CMS) for robust feature learning.
  • Experimental results show that the proposed NASA-Swin model achieves state-of-the-art accuracy and superior generalization in detecting images from various diffusion and GAN models, including previously unseen ones.

Noise-Informed Diffusion-Generated Image Detection with Anomaly Attention

In the field of image synthesis, diffusion models, epitomized by Denoising Diffusion Probabilistic Models (DDPMs) and Denoising Diffusion Implicit Models (DDIMs), have significantly advanced the quality and realism of synthetic imagery. This progress, however, has exacerbated concerns regarding the malicious exploitation of such technology, underscoring the need for robust diffusion-generated image detection methodologies. The paper proposes an innovative detection paradigm focused on the intrinsic noise patterns in images generated by diffusion models—patterns that are qualitatively distinct from those found in genuine images.

Key Contributions

  1. Noise-Aware Self-Attention Mechanism (NASA): The authors introduce the Noise-Aware Self-Attention (NASA) module tailored to self-attention mechanisms, which allocates enhanced focus on anomalous noise features within image regions. This refined attention enables the detector to identify distinctive noise characteristics inherent in diffusion-generated imagery.
  2. NASA-Swin Architecture: The research integrates the NASA module with Swin Transformer blocks to construct NASA-Swin—a novel detection architecture. Swin Transformer is leveraged due to its hierarchical design and efficiency in computing attention weights within localized windows, facilitating the capture of noise-related features.
  3. Cross-Modality Fusion Embedding (CMFE): To effectively harness residual noise data alongside RGB inputs, a cross-modality fusion technique is employed. By interleaving data from RGB and noise channels, the detector benefits from enhanced modality-specific feature learning.
  4. Channel Mask Strategy (CMS): The CMS is introduced as a data augmentation strategy that conceals or alters channel data, compelling the model to adaptively learn complementary features across channels.

Experimental Results

The proposed NASA-Swin model demonstrated superior generalization capacity across multiple datasets, especially when evaluating images synthesized by previously unseen generative models such as ADM, Glide, Midjourney, VQDM, and Wukong. NASA-Swin achieved state-of-the-art accuracy in detecting images from these models, surpassing previous detectors and setting new benchmarks in diffusion-generated image detection tasks. Additionally, its success extends to detecting GAN-generated images, particularly those synthesized by BigGAN, showcasing the versatility and robustness of the approach.

Theoretical and Practical Implications

This paper provides a valuable perspective on image forgery detection, emphasizing the analysis of noise residuals as an effective methodological approach. It delineates a pathway for developing detectors that maintain high accuracy even as generative models evolve and diversify. The theoretical foundation laid by NASA and NASA-Swin may inspire enhanced detection techniques in the future, potentially integrating more complex multi-modal analysis or synergistic use of supplementary data streams such as temporal or semantic cues present in video synthesis tasks.

Given the ongoing advancements in image and video generation technology, ensuring data integrity and authenticity will likely require increasingly sophisticated methodologies. The insights provided by this paper are instrumental in formulating future strategies against the challenges posed by high-fidelity generative models, fostering developments that safeguard the credibility of digital media.