Identity-Preserving AIGC: Detection & Fairness
- Identity-preserving AIGC is an AI technique that edits images by altering context (e.g., attire or background) while preserving core facial identity such as bone structure and skin tone.
- The approach uses identity-preserving prompts to ensure that secondary features can be modified without compromising the inherent identity, setting it apart from traditional deepfakes.
- Comparative studies of detectors like AIDE and Effort show that while fine-tuning boosts in-domain accuracy, it also leads to brittle performance on held-out domains, especially for under-represented groups.
Identity-preserving AIGC (IP-AIGC) denotes AI-generated or AI-edited visual content in which the depicted individual retains their unique identity—specifically, facial bone structure, skin tone, and core identity are preserved—while secondary aspects such as attire, background, lighting, or facial expression are transformed. By definition, IP-AIGC is distinct from classical deepfakes, which alter identity, and presents unique detection challenges due to the subtle, low-signal generative artifacts introduced in the context of identity-invariant edits. The problem landscape is particularly acute for under-represented demographic groups, as documented in recent cross-generator studies centered on Indian and South-Asian faces (Dubey et al., 2 Dec 2025).
1. Definition and Distinction of Identity-Preserving AIGC
Identity-preserving AIGC is formally instantiated as follows. Given a real image of person %%%%1%%%%, an IP-AIGC generator produces
where is an identity-preserving prompt enforcing constraints such as preservation of bone structure, skin tone, and general facial identity. Edits may affect clothing, background, lighting, or facial expression, but do not alter subject identity. This stands in contrast to face swaps and generic deepfakes, which typically introduce identity cues (e.g., facial mismatches) leveraged by conventional detectors. In IP-AIGC, those cues are absent, necessitating the identification of more subtle, generator-specific artifacts for detection (Dubey et al., 2 Dec 2025).
2. Dataset Curation: Training and Evaluation Protocols
Investigations into IP-AIGC require datasets that provide both standard and identity-preserving domains. One recent systematic study targeted Indian and South-Asian populations, assembling the following:
- Training splits:
- FairFD (Indian subset): 10,308 real, 61,678 fake images
- HAV-DF-train: 2,444 real, 3,759 fake images
- Standard in-domain test split:
- HAV-DF-test: 535 real, 931 fake frames
- Held-out non-IP sets (HIDF):
- HIDF-img: 747 real, 747 fake images
- HIDF-vid: 142 real, 223 fake video frames
- Held-out IP-AIGC sets (HIDF-*-ip-genai): Generated via commercial web-UI image editors (Gemini 2.5-Flash, ChatGPT image-edit API), with identity-preserving prompts:
- HIDF-img-ip-genai: 747 real, 535 AI-edits
- HIDF-vid-ip-genai: 142 real, 108 AI-edits
Prompts emphasized explicit constraints (“Preserve facial identity, bone structure, and skin tone...”), ensuring that core facial identity remained unchanged while targeting context or appearance edits.
3. State-of-the-Art Detectors: AIDE and Effort
Two leading deep neural architectures have been benchmarked for IP-AIGC detection:
- AIDE:
- Architecture: Convolutional backbone (e.g., ResNet) with a binary classification head.
- Pretraining (PT): GenImage weights, leveraging large AIGC-vs.-real corpora.
- Fine-tuning (FT): Full network fine-tuned on FairFD+HAV-DF (batch size 32, lr , 10–20 epochs).
- Effort:
- Architecture: Dual-branch CNN with orthogonal subspace decomposition into “content” and “forgery” components.
- Pretraining: Public Effort checkpoint, broadly trained.
- Fine-tuning: Complete fine-tuning on Indian-centric data (batch size 40, lr ).
Both are evaluated under PT and FT regimes, explicitly to probe generalization and overfitting dynamics (Dubey et al., 2 Dec 2025).
4. Metrics: Formal Definitions
Detection efficacy is quantitatively assessed using the following metrics, where , , , are true positives, true negatives, false positives, and false negatives, respectively:
- Accuracy:
- Area Under the ROC Curve (AUC):
If and ,
- Average Precision (AP):
Given precision–recall curve ,
with , .
- Equal Error Rate (EER):
Threshold satisfies . Numerically,
These metrics collectively characterize both discrimination and calibration, with AUC and EER prioritized for cross-domain generalization evaluation.
5. Experimental Findings: Generalization and Overfitting
A comparative summary of PT/FT performance for each major domain follows:
| Detector | Split | PT AUC / EER | FT AUC / EER | AUC / EER |
|---|---|---|---|---|
| AIDE | HAV-DF-test (in-dom) | 0.535 / 0.484 | 0.809 / 0.259 | +0.274 / –0.225 |
| AIDE | HIDF-img-ip-genai | 0.923 / 0.123 | 0.563 / 0.461 | –0.360 / +0.338 |
| Effort | HAV-DF-test (in-dom) | 0.739 / 0.353 | 0.944 / 0.125 | +0.205 / –0.228 |
| Effort | HIDF-img-ip-genai | 0.740 / 0.321 | 0.533 / 0.447 | –0.207 / +0.126 |
Fine-tuning yields substantial in-domain improvement (e.g., Effort AUC +0.205, AIDE +0.274), but results in a dramatic drop in AUC and increase in EER on held-out IP-AIGC splits (e.g., AIDE –0.360, Effort –0.207 in AUC). Non-IP held-out (HIDF-img) performance is either stable or slightly improved, indicating that the generalization gap is a specific function of identity-preserving edits rather than a generic out-of-distribution failure (Dubey et al., 2 Dec 2025).
6. Analysis: Generator-Specific Overfitting and Brittleness
Experimental evidence demonstrates that both AIDE and Effort, when fine-tuned on a fixed suite of generators, become susceptible to overfitting on superficial generator "fingerprints." Such artifacts are highly idiosyncratic to the training corpora and generalize poorly to content produced by commercial APIs or novel editing paradigms, especially when identity is preserved. Pretraining confers some robustness for detecting overt artifacts (e.g., smoothing, aliasing), but not the subtle cues characteristic of IP-AIGC. Fine-tuning exacerbates specialization to in-domain signals—leading to brittle cross-generator failure modes, particularly for under-represented groups (Dubey et al., 2 Dec 2025).
A plausible implication is that improving sensitivity to subtler generative traces requires adaptation strategies that balance invariance (to preserve identity-relevant features) and sensitivity (to generic generative errors), rather than indiscriminate fine-tuning on limited generator outputs.
7. Pathways for Robust and Fair IP-AIGC Detection
To address generalization and fairness constraints, the following approaches are proposed:
- Representation-preserving adaptation: Rather than fully fine-tuning, use residual subspace adaptation or lightweight adapter modules to update only the forgery-relevant subspace, while preserving core identity encodings. Fairness constraints targeting demographic parity are recommended to mitigate disparate performance.
- India-aware benchmark curation: Systematic curation of large-scale, demographically rich IP-AIGC datasets for Indian/South-Asian populations, spanning a broad spectrum of identity-preserving edits, is advocated. This would facilitate detectors that can systematically disentangle generative artifacts from natural, identity-invariant variability (Dubey et al., 2 Dec 2025).
In aggregate, current detection systems achieve high in-domain accuracy, but are fundamentally brittle under the regime of identity-preserving generative edits, particularly for historically under-studied cohorts. The field thus requires new algorithmic and empirical standards emphasizing subspace adaptation and demographic breadth to ensure robust, fair, and generalizable IP-AIGC detection.