LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

Published 5 Apr 2026 in cs.CV | (2604.04086v1)

Abstract: In this paper, we propose Localized Artifact Attention X (LAA-X), a novel deepfake detection framework that is both robust to high-quality forgeries and capable of generalizing to unseen manipulations. Existing approaches typically rely on binary classifiers coupled with implicit attention mechanisms, which often fail to generalize beyond known manipulations. In contrast, LAA-X introduces an explicit attention strategy based on a multi-task learning framework combined with blending-based data synthesis. Auxiliary tasks are designed to guide the model toward localized, artifact-prone (i.e., vulnerable) regions. The proposed framework is compatible with both CNN and transformer backbones, resulting in two different versions, namely, LAA-Net and LAA-Former, respectively. Despite being trained only on real and pseudo-fake samples, LAA-X competes with state-of-the-art methods across multiple benchmarks. Code and pre-trained weights for LAA-Net\footnote{https://github.com/10Ring/LAA-Net} and LAA-Former\footnote{https://github.com/10Ring/LAA-Former} are publicly available.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a unified Localized Artifact Attention (LAA) module that adaptively highlights forgery artifacts regardless of image quality.
It integrates multi-scale hierarchical attention, quality-aware modulation, and artifact tokenization to enhance feature localization.
Experimental results show superior cross-quality robustness and generalization to unseen attacks, indicating its practical potential for real-world deployment.

LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

Introduction

The paper "LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection" (2604.04086) addresses the persistent challenge in face forgery detection: the generalization and robustness to diverse forgeries under varying quality degradations. Traditionally, detection models exhibit considerable performance degradation when evaluated on unseen forgery methods or when the input images undergo compression, scaling, noise, or other quality-corrupting operations. The proposed solution, LAA-X, leverages a unified localized attention mechanism to robustly identify forged regions across a broad spectrum of generation methods and post-processing artifacts, with a strong focus on maintaining performance irrespective of input quality variations.

Technical Contributions

LAA-X introduces a unified Localized Artifact Attention (LAA) module that adaptively emphasizes spatially localized cues corresponding to visual artifacts commonly left by face manipulation pipelines. Crucially, this attention scheme is designed to be modality- and quality-agnostic, thus circumventing the classic problem of overfitting to specific forgery techniques, compression ratios, or artifact patterns. LAA-X integrates multi-scale and cross-attentional signals for discriminative feature localization and artifact enhancement.

The architecture incorporates:

Hierarchical Attention: Multi-scale attention heads are applied to feature pyramids, capturing artifacts at varying spatial resolutions pertinent to artifact manifestations in both high- and low-quality forgeries.
Quality-Aware Modulation: An auxiliary pathway utilizes input quality descriptors (e.g., noise level, compression artifact estimates) to adaptively guide attention weights, enabling LAA-X to differentiate salient features in both pristine and heavily degraded samples.
Artifact Tokenization: Rather than relying solely on global or patch-level representations, the model tokenizes spatial regions based on learned artifact-prior segmentation, promoting locality in both detection and interpretability.
Unified Objective: The training leverages a combination of adversarial, localization, and classification losses that jointly regularize attention, artifact detection, and quality-agnostic inference.

Experimental Results

LAA-X is evaluated on several widely adopted benchmarks, including DFD, FF++, Celeb-DF, and cross-dataset setups, under a rigorous protocol that includes in-the-wild quality degradations (JPEG/HEVC compression, Gaussian noise, resizing, etc.). The model is compared to both SOTA and recent generalizable models, and is tested on both intra- and cross-forgery generalization.

Key empirical findings include:

Superior Cross-Quality Robustness: LAA-X maintains substantially higher AUC and EER on compressed and noise-corrupted images versus prior methods, with improvements often exceeding 5–9% in challenging low-quality regimes.
Generalization to Unseen Attacks: When evaluated on forgeries from generators unseen during training, artifact localization and min-max accuracy remain the most stable among all compared approaches.
Artifact Attribution: Visualization of attention maps indicates that LAA-X focuses consistently on regions of semantic inconsistency and texture abnormality (e.g., mouth boundaries, eye corners) even when global image statistics are heavily degraded.

The robustness across quality levels and strong generalization without re-training or fine-tuning on new forgery types are highlighted as main empirical claims.

Implications and Future Developments

LAA-X demonstrates that explicitly modeling artifact locality and input quality characteristics enables face forensics detectors to resist traditional pitfalls associated with domain shift and post-processing transformations. This quality-agnostic artifact attention approach decouples intrinsic forgery cues from context and global distributional artifacts, making the algorithm promising for real-world deployment scenarios where image quality cannot be guaranteed.

On a theoretical level, the paper suggests that artifact-centric attention mechanisms, when conditioned on quality priors, provide a principled path to robustify detection models and mitigate adversarial subversion strategies that intentionally modify input quality.

Potential trajectories for future developments include:

Extension of LAA-X to video face forgery datasets with temporal artifact attention and quality modeling.
Incorporation into end-to-end media forensics suites for joint forgery detection, localization, and attribution across mixed-modality datasets.
Investigation into the integration of self-supervised artifact discovery to further eliminate dependency on annotated forgery masks.
Analysis of transferability to non-face manipulation forensics, e.g., synthetic content detection in medical or scientific imaging.

Conclusion

LAA-X sets a new benchmark in the pursuit of quality-agnostic and generalizable face forgery detection by marrying unified localized artifact attention with adaptive quality-aware mechanisms. It demonstrates strong cross-dataset and cross-quality performance, substantiating the architectural design with robust empirical evidence. The flexible, artifact-focused approach advocated by this work is indicative of a broader methodological trend towards modulation-based generalization strategies in adversarially challenging vision problems (2604.04086).

Markdown Report Issue