Identity Guidance in Diffusion Models
- Identity guidance is a conditioning mechanism in diffusion models that directs generation toward a chosen target identity while retaining the visual structure of the source.
- It leverages gradient-based updates in latent space to modify the reverse diffusion process, balancing target impersonation with semantic divergence from the source.
- By integrating identity convergence, semantic divergence, and structure-preserving losses, this approach improves transferability against black-box face recognition systems.
Identity guidance in diffusion models denotes a family of conditioning and control mechanisms that steer generation toward a specified identity-related objective while constraining other semantic factors. In "Diffusion-based Adversarial Identity Manipulation for Facial Privacy Protection," identity guidance is defined as a gradient-based control signal injected into the reverse diffusion process in latent space so that a generated face is recognized as a chosen target identity by face recognition systems while remaining visually natural and structurally close to the source face (Wang et al., 30 Apr 2025). More broadly, recent work uses the same term for substantially different operations, including adaptive guidance origins for identity-preserving editing, adaptive negative conditioning for identity-consistent synthesis, decoupled subject/context guidance for personalization, and training-centric disentanglement for anonymization (Wolf et al., 3 Feb 2026, Caldeira et al., 14 Jan 2026, Kim et al., 1 Jul 2026, Yang et al., 28 Oct 2025).
1. Formal task and threat model
In DiffAIM, the task is adversarial identity manipulation for facial privacy protection. Given a source face and a target face , the objective is to generate an adversarial image that preserves human-perceived identity and visual naturalness with respect to , yet impersonates the target identity for machine face recognition systems (Wang et al., 30 Apr 2025). For an FR model with embedding similarity , the target conditions are:
- for verification,
- for identification,
The optimization is posed as
where measures unnaturalness relative to the source (Wang et al., 30 Apr 2025).
The threat model is transfer-based and targets black-box FR systems at test time. The attacked systems include IRSE50, IR152, FaceNet, MobileFace, Face++, and Aliyun, treated as black-boxes during attack execution. Training uses a set of white-box surrogate FR models 0 with available gradients, and the attack mode is impersonation rather than simple obfuscation (Wang et al., 30 Apr 2025). Within that setting, identity guidance occupies a precise role: it modifies the diffusion trajectory so that the generated sample exhibits identity convergence toward the target, semantic divergence from the source, and structure preservation.
This formulation differs from identity-consistency objectives in synthesis or restoration. For example, AdaptDiff uses positive and negative identity conditions 1 and 2 to improve inter-class separability in identity-conditioned face synthesis (Caldeira et al., 14 Jan 2026), whereas ID3Face learns an identity-decoupled latent space for anonymization and samples a new identity vector at inference time (Yang et al., 28 Oct 2025). DiffAIM instead uses identity guidance to induce targeted machine-level impersonation under a black-box transfer setting (Wang et al., 30 Apr 2025).
2. Latent diffusion framework and the definition of identity guidance
DiffAIM uses a latent diffusion model with a VAE encoder 4 and decoder 5. A source face 6 is encoded into a latent
7
and diffusion is performed in latent space rather than pixel space in order to leverage the natural image prior and the efficiency of latent diffusion models (Wang et al., 30 Apr 2025). The forward process is
8
with 9, and the standard reverse step is
0
Identity guidance is defined on the adversarial latent 1. At each diffusion step, DiffAIM predicts the corresponding clean latent
2
decodes it into image space,
3
and forms a total loss over target identity, source divergence, and structure (Wang et al., 30 Apr 2025). The guidance vector is then
4
This gradient is the paper’s adversarial identity guidance.
Facial identity is represented by an ensemble of pretrained FR models 5, such as IRSE50, IR152, FaceNet, and MobileFace. These models output embedding vectors, and similarity is cosine similarity: 6 The target identity is represented directly by target embeddings 7, without a classification head or cross-entropy loss (Wang et al., 30 Apr 2025).
In this formulation, identity guidance is neither text conditioning nor class-label conditioning. That distinction is central to the paper’s conceptual framing. Whereas classifier guidance uses 8 and classifier-free guidance interpolates conditional and unconditional score estimates, DiffAIM uses gradients of a customized identity-oriented objective computed from decoded images and feature spaces external or internal to the diffusion backbone (Wang et al., 30 Apr 2025). This suggests that, in DiffAIM, identity is treated as a metric relation in embedding space rather than as a discrete label.
3. Loss design: identity convergence, semantic divergence, and structure preservation
The central adversarial term in DiffAIM is not a naive difference between FR similarity to target and FR similarity to source. The paper notes that an explicit FR-based divergence term tends to create artifacts, and therefore decouples the objective into identity convergence in FR embedding space and semantic divergence in U-Net feature space (Wang et al., 30 Apr 2025).
The semantic divergence term is defined using the deepest or intermediate features from the frozen diffusion U-Net, denoted 9. Let 0 be the benign inverted latent and 1 the adversarial latent. Then
2
This pushes the adversarial trajectory away from the benign trajectory in the learned generative semantic space while remaining visually plausible (Wang et al., 30 Apr 2025).
Identity convergence is defined for each FR model by
3
Over an ensemble, the adversarial loss becomes
4
where
5
The ensemble weights are adaptively updated so that harder surrogate models receive larger weights: 6 This adaptive ensemble is intended to improve transferability to unknown black-box FR systems by emphasizing models that are currently least fooled (Wang et al., 30 Apr 2025).
Structure-preserving regularization is defined through U-Net self-attention maps. Let 7 denote the self-attention layers and 8 the attention map at layer 9 for latent 0. Then
1
Although written with a negative sign in the paper’s notation, it is minimized in practice so as to keep attention differences small (Wang et al., 30 Apr 2025). The total loss used for identity guidance is
2
with 3 in the reported implementation.
The resulting decomposition is technically specific. Identity is pulled toward the target in FR space, pushed away from the source in U-Net semantic space, and constrained in self-attention space to preserve structure. A plausible implication is that DiffAIM treats realism as a property of the generative manifold rather than as an external perceptual constraint.
4. Injection into reverse diffusion and identity-sensitive scheduling
DiffAIM integrates identity guidance directly into the reverse diffusion update. Starting from the benign reverse step
4
the adversarial process maintains an adversarial latent 5 and first computes the base denoising step with consistent noise: 6 It then performs an inner loop of 7 gradient steps per timestep to accumulate a bounded guidance vector (Wang et al., 30 Apr 2025).
The accumulated guidance update is
8
initialized with 9, where 0 is the step size and 1 projects into an 2-ball of radius 3. After 4 inner iterations,
5
or equivalently,
6
The paper explicitly relates this to classifier guidance in spirit, but the guidance objective and gradients are defined in latent space by the custom adversarial loss rather than by class probabilities (Wang et al., 30 Apr 2025).
A further design choice is identity-sensitive timestep truncation. DiffAIM reports that applying guidance at all timesteps harms global structure, because early timesteps define low-frequency global structure while later timesteps refine high-frequency identity-sensitive regions. The method therefore runs benign reverse diffusion from 7 down to a truncated timestep 8, and applies adversarial identity guidance only from 9 down to 0 (Wang et al., 30 Apr 2025). The reported optimal regime is 1; with 2, the implementation uses 3.
This scheduling choice places identity guidance in later denoising phases, where FR models’ Grad-CAM attention becomes localized on identity-relevant regions such as eyes, nose, and mouth (Wang et al., 30 Apr 2025). Similar timestep asymmetry appears in other identity-guidance formulations. AdaptDiff increases negative identity guidance toward later sampling steps because early steps contain little identity information, while later steps benefit from stronger repulsive conditioning relative to negative identities (Caldeira et al., 14 Jan 2026). This suggests a broader pattern: identity-sensitive control in diffusion models is often most effective after coarse structure has been established.
5. Empirical behavior, transferability, and practical trade-offs
DiffAIM reports that, on verification tasks, average attack success rate over IRSE50, IR152, FaceNet, and MobileFace on CelebA-HQ and LADN improves to 4, compared with 5 for GIFT and 6 for DiffAM (Wang et al., 30 Apr 2025). On identification tasks, average Rank-1 targeted ASR is 7 and Rank-5 is 8, and on commercial APIs Face++ and Aliyun the method attains the highest and most stable confidence scores, around 9 and 0 respectively (Wang et al., 30 Apr 2025). The paper further reports that adding the U-Net-based semantic divergence term improves robustness and ASR.
The implementation uses Stable Diffusion with DDIM sampling and 1 steps. The latent is the standard Stable Diffusion VAE latent, exemplified as 2 for 3 images. Guidance hyperparameters include 4, 5 inner-loop iterations per guided timestep, step size 6, and structure regularization weight 7 (Wang et al., 30 Apr 2025).
The reported trade-offs are explicit. Larger guidance strength, larger 8, or larger 9 improve attack success rate but risk artifacts and structural drift; smaller or later guidance improves visual quality but reduces attack success (Wang et al., 30 Apr 2025). Timing is also dataset- and model-dependent, and the method incurs substantial computation cost because it performs multiple inner-loop gradient steps at many timesteps.
The same article reports that removing 0 increases artifacts and worsens PSNR, SSIM, and FID, emphasizing that identity guidance alone is not sufficient for naturalness (Wang et al., 30 Apr 2025). A plausible implication is that the empirical strength of DiffAIM derives less from any single loss term than from the coordination of metric-space target attraction, semantic-space source repulsion, attention-map stabilization, and late-stage scheduling.
6. Relation to other identity-guidance formulations
The term identity guidance is used more broadly in recent diffusion and flow literature, but with different meanings and technical implementations. In AdaOr, identity guidance appears as an identity-conditioned adaptive origin for editing: a special token 1 is trained to reconstruct the input, and the guidance origin is interpolated between the null instruction and the identity instruction as a function of edit strength 2 (Wolf et al., 3 Feb 2026). The central update is
3
with 4 defined from the null and identity predictions. Here identity guidance preserves the source under continuous editing, rather than inducing impersonation.
In AdaptDiff, identity guidance is implemented through positive and negative identity conditions in identity-conditioned synthesis: 5 with a linear schedule
6
This mechanism is designed to improve inter-class separability while preserving intra-class variation in synthetic face datasets (Caldeira et al., 14 Jan 2026).
In DeGu, identity guidance is decoupled from context guidance by routing fidelity and editability through separate conditioning streams: 7 with an optional spatial mask 8 that confines fidelity guidance to subject regions (Kim et al., 1 Jul 2026). In that formulation, identity guidance is a plug-and-play fidelity stream for personalization.
Training-centric variants also exist. ID9Face treats identity as an explicit latent variable in an identity-decoupled anonymization framework and fuses identity-aware and non-identity-aware features via the Identity-Guided Latent Harmonizer: 0 while Orthogonal Identity Mapping further enforces orthogonality between source and sampled identity vectors in the learned identity space (Yang et al., 28 Oct 2025). In DiffFace, identity guidance is externalized as a face-recognition energy term during diffusion sampling,
1
combined with semantic and gaze guidance for face swapping (Kim et al., 2022).
These comparisons clarify a common misconception: identity guidance is not a single standardized operator. In DiffAIM it denotes adversarial latent-space control for targeted impersonation against FR systems (Wang et al., 30 Apr 2025); in editing it often denotes identity preservation against undesired edits (Wolf et al., 3 Feb 2026, Kim et al., 1 Jul 2026); in synthesis it may refer to positive or negative identity conditioning (Caldeira et al., 14 Jan 2026); and in anonymization it may be built into the latent factorization itself (Yang et al., 28 Oct 2025).
7. Limitations, broader significance, and open directions
DiffAIM identifies timing sensitivity, surrogate-model dependence, and computational cost as its main limitations. Guidance applied too early harms structure, guidance applied too late weakens attacks, and the optimal truncation timestep depends on the dataset and model (Wang et al., 30 Apr 2025). Because the attack is transfer-based, distribution shifts or substantially different FR architectures may reduce transferability. The need for multiple gradient refinements at many diffusion steps also makes the method expensive.
The paper suggests several future directions, including more adaptive or learned timestep schedules for guidance, better identity-space regularizers that further decouple human-perceived identity from FR identity, and extensions to more controllable attributes such as explicit pose or expression preservation and multi-target impersonation (Wang et al., 30 Apr 2025). These proposals align with broader trends in the literature. AdaOr generalizes the notion of an identity origin into a wider class of semantic origin tokens (Wolf et al., 3 Feb 2026). DeGu motivates more explicit conditional independence between identity and context features (Kim et al., 1 Jul 2026). TIGER, in face video restoration, combines an identity prior, a geometry prior, and a generative prior to control identity fidelity and temporal stability jointly (Zhou et al., 23 Jun 2026). Lookahead Anchoring reframes future keyframes as directional beacons that preserve identity in autoregressive human animation, with lookahead distance controlling the balance between expressivity and consistency (Seo et al., 27 Oct 2025).
Within this broader landscape, DiffAIM occupies a distinct position. Its identity guidance is adversarial rather than restorative or preservative; it leverages the generative prior of a latent diffusion model not to keep identity fixed, but to make targeted impersonation visually natural and transferable (Wang et al., 30 Apr 2025). This suggests a dual significance. On one hand, it advances privacy-protection research by exposing vulnerabilities of black-box FR systems, including commercial APIs. On the other hand, it sharpens the technical distinction between human-perceived identity, generative semantics, and machine-recognized identity, a distinction that recurs across recent work on editing, synthesis, personalization, anonymization, and restoration (Wang et al., 30 Apr 2025, Wolf et al., 3 Feb 2026, Yang et al., 28 Oct 2025).