Real-Centered Detection Network (RCDN)

Updated 24 January 2026

The paper introduces a dual-branch CNN that fuses spatial and frequency features to robustly distinguish authentic faces from forgeries.
It leverages a real-centered loss formulation alongside classification and separation losses to enforce a compact real-face embedding and push forgeries away.
Empirical evaluations on the DiFF dataset show that RCDN achieves state-of-the-art in-domain accuracy and superior cross-domain stability compared to traditional detectors.

The Real-Centered Detection Network (RCDN) is a dual-branch convolutional neural network designed for robust face forgery identification in scenarios characterized by rapidly evolving generative methods. RCDN anchors the feature representation around authentic facial images (the "real-center"), pushing all forgeries away from this center in the embedding space, thereby achieving strong generalization to unseen forged image distributions. The framework combines frequency and spatial feature extraction, a cross-domain-optimized network geometry, and a real-centered loss formulation to address the generalization gap inherent in traditional CNN-based forgery detectors (McCurdy et al., 17 Jan 2026).

1. Motivation and Core Principles

The emergence of advanced facial synthesis pipelines—ranging from diffusion-based to GAN-based forgery tools—has led to a proliferation of highly realistic fake images, presenting major challenges for automated forgery detectors. Conventional detectors (e.g., Xception, EfficientNet, ResNet+CBAM) achieve over 98% in-domain accuracy when trained and tested on the same forgery category but exhibit an 8–10 point degradation on cross-domain forgeries. This stems from overfitting to the idiosyncrasies of specific fake data distributions and an inability to accommodate distribution shifts as new forgery pipelines appear.

RCDN is predicated on the observation that while distributions of synthesized fakes are diverse and dynamic, the distribution of real face images is comparatively stable. Instead of modeling all conceivable forgery patterns, RCDN selectively anchors its representation around real facial images and enforces that forgeries are distinct, regardless of their generation method. This paradigm shift targets the core requirement for practical, future-proof defenses against forgery: robust, cross-domain identification.

2. Architecture and Feature Extraction

RCDN utilizes a dual-branch architecture consisting of spatial and frequency branches, followed by feature fusion, projection, and dual-head supervision:

Spatial Branch: Processes the RGB face input of size $x\in\mathbb{R}^{H\times W\times 3}$ via a pre-pooling Xception backbone, outputting a $2048$-dimensional vector $f_s = \phi_s(x) \in \mathbb{R}^{1 \times 2048}$ . This branch prioritizes semantic and structural cues, such as consistency of facial parts and identity.
Frequency Branch: Applies a 2D FFT $\mathcal{F}(x)$ , re-centers the frequency spectrum, extracts magnitude, and applies log-compression with channel-wise standardization to isolate subtle spectral artifacts salient in fakes (e.g., high-frequency noise, diffusion inconsistencies). A lightweight ConvNet processes these features, yielding $f_f \in \mathbb{R}^{1 \times 256}$ .
Feature Fusion and Projection: Concatenates the spatial and frequency vectors ( $f_c = [f_s \| f_f]$ ), projects to a $128$-dimensional embedding via a multilayer perceptron, and $\ell_2$ -normalizes the output: $\hat z = \frac{z}{\|z\|_2}$ .
Supervision Heads:
- Classification Head: A linear layer maps $\hat z$ to class logits, optimized with cross-entropy.
- Real-Centered Head: A learnable vector $c \in \mathbb{R}^{1 \times 128}$ defines the "real-center." Real samples are pulled toward $c$ , while fakes are penalized if they lie within a margin $m$ of $c$ .

3. Loss Functions and Embedding Geometry

RCDN introduces a hybrid loss combining classification, center, and separation objectives:

Classification Loss: Standard cross-entropy on predicted logit for real/fake binary label.

$\mathcal{L}_{cls} = \mathrm{CrossEntropy}(\mathrm{logits}(\hat z), y)$

Center Loss: Enforces compactness of real embeddings near the real-center and margins fake embeddings away:

$d(\hat z, c) = \|\hat z - c\|_2$

$\mathcal{L}_{center} = \frac{1}{|R|}\sum_{\hat z_r \in R} d(\hat z_r, c)^2 + \frac{1}{|F|}\sum_{\hat z_f \in F} \max(0, m - d(\hat z_f, c))$

Separation Loss: Ensures the mean distance of fakes to $c$ exceeds that of reals by margin $m$ on a batch level:

$\bar d_{real} = \frac{1}{|R|} \sum_{\hat z_r} d(\hat z_r, c), \quad \bar d_{fake} = \frac{1}{|F|} \sum_{\hat z_f} d(\hat z_f, c)$

$\mathcal{L}_{sep} = \max(0, \bar d_{real} - \bar d_{fake} + m)$

The final objective is

$\mathcal{L} = \mathcal{L}_{cls} + \lambda_c \mathcal{L}_{center} + \lambda_s \mathcal{L}_{sep},$

where $\lambda_c, \lambda_s$ are hyperparameters tuned via validation.

Embedding normalization is critical for stable distance-based optimization and geometry enforcement.

4. Empirical Evaluation

Comprehensive experimental validation is conducted on the DiFF dataset, comprising over 500,000 forgeries from 13 diffusion models, with diverse text and visual prompts, and three core forgery categories: Face Editing (FE), Image-to-Image (I2I), and Text-to-Image (T2I). Training and evaluation protocols include both in-domain and cross-domain settings.

Performance Metrics

In-domain Accuracy: All leading baselines exceed 98%. RCDN achieves state-of-the-art:
- FE: 0.9995
- I2I: 0.9975
- T2I: 0.9990
- Average: 0.9987
Cross-domain Accuracy (average off-diagonal, i.e., train on one, test on others):

Method	Cross Average
Xception	0.8970
EfficientNet	0.9075
ResNet+CBAM	0.9015
DIRE	0.9048
RCDN	0.9369

Cross/ In-domain Stability Ratio and Gap:

Method	In-domain	Cross Avg	Gap	Ratio
Xception	0.9887	0.8970	0.0917	0.907
EfficientNet	0.9930	0.9075	0.0855	0.914
ResNet+CBAM	0.9837	0.9015	0.0822	0.916
DIRE	0.9775	0.9048	0.0727	0.926
RCDN	0.9987	0.9369	0.0618	0.938

Ablation studies reveal that elimination of the frequency branch or real-centered objectives consistently degrades cross-domain robustness by 3–4 points and reduces the stability ratio from 0.938 to approximately 0.92, demonstrating the necessity of both components.

5. Model Training and Implementation Details

For each forgery category, 10,000 training and 2,000 testing images are sampled from DiFF. Preprocessing includes face cropping and resizing for RGB input, and frequency transformation for the auxiliary branch. Both branches are optimized jointly in an end-to-end fashion, using Adam with weight decay and learning rate scheduling. The margins and balance weights $(\lambda_c, \lambda_s, m)$ are set through validation-specific tuning.

This suggests that hyperparameter selection is moderately sensitive and should be dataset-specific. A plausible implication is that further work is required to automate or stabilize this process, particularly as the real data distribution evolves.

6. Analysis, Limitations, and Future Directions

RCDN shifts the detection paradigm from enumerating fake patterns to defining a geometrically stable "real island" in feature space. Empirical results confirm that authentic faces consistently form a compact, high-density cluster, while all known tested forgeries, irrespective of synthesis method, are mapped outside this cluster. This yields higher resilience to unseen generation pipelines.

Limitations include the need for manual tuning of $\lambda_c$ , $\lambda_s$ , and $m$ per dataset, and potential drift of the real-center if the authentic image domain changes significantly (e.g., with variations in pose, lighting, or source distributions). Future extensions may involve dynamic margin scheduling, self-supervised pretraining regimes, and the integration of video deepfake detection via adapter modules.

7. Significance and Practical Implications

RCDN introduces a real-centered, dual-branch CNN that achieves state-of-the-art in-domain performance (≥99.8%), while reducing the generalization gap on cross-domain face forgery detection to 6.2 points (versus 7.3–10.0 for baselines) and producing the highest cross/in-domain stability ratio recorded (0.938) (McCurdy et al., 17 Jan 2026). Its design enables practical deployment as a front-line defense system, since only authentic examples require comprehensive representation.

Key practical implications include:

Enhanced robustness to future and unseen facial forgery pipelines, including next-generation GANs and diffusion-based models.
The requirement for real images to be well-represented, with no necessity for enumerating all possible forgery methods.
Open-source availability facilitating reproducibility and further research.

By anchoring detection around the statistical invariance of real faces, RCDN provides a scalable, domain-robust solution to cross-domain face forgery identification.

Markdown Report Issue Upgrade to Chat

References (1)

RCDN: Real-Centered Detection Network for Robust Face Forgery Identification (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Real-Centered Detection Network (RCDN).