Papers
Topics
Authors
Recent
Search
2000 character limit reached

Identity-Style Norm. with 3D Priors

Updated 15 April 2026
  • The paper introduces a method that uses per-vertex target normals and a differentiable ARAP layer to balance expressive style transfer with robust identity preservation.
  • It employs SVD-based rotation estimation and Poisson solvers for smooth mesh deformation, integrating 3D priors into the style and attribute mixing process.
  • Empirical results demonstrate superior trade-offs in identity fidelity and stylization quality for both mesh stylization and video face swapping compared to baseline methods.

Identity-style normalization with 3D priors is an emerging paradigm in geometric deep learning and cross-modal generative modeling where style-driven deformation or transfer is regularized by priors derived from 3D structure. The core objective is to enable expressive stylization or identity transfer (e.g., in mesh stylization or face swapping), while rigorously preserving the geometric identity of the underlying shape or subject. Two state-of-the-art frameworks ― Geometry in Style (mesh stylization) and DynamicFace (video face swapping) ― exemplify this principle by parameterizing deformation or attribute mixing through 3D priors, such as per-vertex surface normals or morphable face model coefficients, and enforcing identity preservation via explicit or implicit geometric constraints (Dinh et al., 29 Mar 2025, Wang et al., 15 Jan 2025).

1. Geometric Deformation with 3D Priors

Geometry in Style formulates stylization as a deformation of a triangle mesh M=(V,F)M = (V,F), where identity preservation is governed by surface normal priors rather than unconstrained vertex displacements. Each vertex kk is assigned a per-vertex target normal u^kR3\hat{u}_k \in \mathbb{R}^3 (unit-norm), which commands the stylization degree at each local patch. The local deformation is solved as a rotation RkSO(3)R_k \in \mathrm{SO}(3) that best aligns both the original normal nkn_k and the edge vectors eije_{ij} in its 1-ring neighborhood to their stylized counterparts, formalized by a local Procrustes energy:

Eloc(k;Rk,u^k)=(i,j)NkwijRkeijeij22+λakRknku^k22E_{\mathrm{loc}}(k; R_k, \hat{u}_k) = \sum_{(i,j)\in N_k} w_{ij} \| R_k e_{ij} - e_{ij} \|_2^2 + \lambda a_k\| R_k n_k - \hat{u}_k \|_2^2

where wijw_{ij} are cotangent weights, aka_k are Voronoi masses, and λ\lambda modulates identity-vs-style tradeoff. The best-fit rotation kk0 is obtained via orthogonal Procrustes (SVD-based).

DynamicFace harnesses 3D morphable face models to explicitly parameterize facial attributes into disentangled coefficients: identity (shape, kk1), expression (kk2), pose (kk3), and albedo (kk4). This enables mixing source and target attributes in a physically grounded manner (Wang et al., 15 Jan 2025).

2. Differentiable As-Rigid-As-Possible (dARAP) Layer

Geometry in Style introduces the differentiable As-Rigid-As-Possible (dARAP) layer, adapting the classical ARAP formulation for smooth and closed-form deformation, compatible with gradient-based optimization. For a mesh deformation kk5, the global ARAP energy,

kk6

is iteratively minimized by alternating between (i) local SVD-based rotation solves (controlled by target normals), and (ii) a global sparse Poisson solve for vertex positions (via the cotangent Laplacian). The entire pipeline is differentiable; gradients propagate through both SVD-based rotation estimation and the linear system solve, enabling end-to-end training under extrinsic supervision (e.g., rendered image losses) (Dinh et al., 29 Mar 2025).

3. Identity Preservation via Rigidity Priors

The dARAP layer itself acts as a strong geometric prior: each local patch is encouraged to deform via rotation only, prohibiting scaling, shear, or degenerate folding. This ensures that extrinsic stylization (e.g., adding ripples, blocky artifacts) does not compromise global part correspondences, silhouette, or structural integrity. Unlike purely Jacobian-regularized methods, dARAP avoids the necessity of additional L₂ identity losses, as the rigidity term is implicit in the energy. The kk7 parameter explicitly controls the balance: excessive kk8 overfits normals at the risk of self-intersection, whereas low kk9 underrepresents style changes (Dinh et al., 29 Mar 2025).

In DynamicFace, identity–style normalization is enforced by explicitly separating identity, pose, expression, and lighting conditions through 3D priors. Four disentangled image-based conditions are generated for each frame: (1) shape-aware pose (normal map), (2) background-preservation mask, (3) expression (semantic map), and (4) illumination (blurred UV texture). Each guides different UNet pipelines, ensuring high-level semantics and fine appearance (via FaceFormer and ReferenceNet modules) are properly injected without compromising subject identity (Wang et al., 15 Jan 2025).

4. Integration with High-level Generative Models

Geometry in Style integrates its 3D deformation pipeline with a text-to-image model by rendering the stylized mesh from multiple views and passing images into a pre-trained 2D diffusion model, guided by a semantic visual loss (Cascaded Score Distillation):

u^kR3\hat{u}_k \in \mathbb{R}^30

u^kR3\hat{u}_k \in \mathbb{R}^31

where the gradients flow through the rasterizer and dARAP solver to optimize target normals u^kR3\hat{u}_k \in \mathbb{R}^32. The overall stylization loop alternates between updating target normals via Adam, local/global dARAP solves, and feedback from visual loss (Dinh et al., 29 Mar 2025).

DynamicFace attaches zero-initialized guider heads to each 2D “condition” and fuses them into UNet backbone features, initializing from pretrained weights and allowing fine-grained guidance with minimal catastrophic forgetting. Identity injection is performed through Face Former (high-level tokens, ArcFace) and ReferenceNet (spatial-attention), with training losses encompassing reconstruction, identity, expression, pose, semantic keypoints, and, for video, temporal consistency (CLIP-based and warping error) (Wang et al., 15 Jan 2025).

5. Empirical Results and Comparison to Baselines

Geometry in Style demonstrates superior trade-off in area preservation (mean triangle area ratio ≈1.08, stdev ≈0.23 across 20 shapes) compared to TextDeformer (0.83±0.36) and MeshUp (1.29±0.36). CLIP similarity to text prompt is on par or slightly better (0.655 vs. 0.653 for MeshUp and 0.650 for TextDeformer). Qualitatively, the method supports expressive style (e.g., Lego blockiness, armor effect) while maintaining pose and part identity. Control over u^kR3\hat{u}_k \in \mathbb{R}^33 enables users to adjust the fidelity-vs-style trade-off at inference. Bump-map approaches are limited to small surface effects, and Jacobian-deformation baselines tend to degrade limb fidelity and silhouette (Dinh et al., 29 Mar 2025).

DynamicFace achieves state-of-the-art results for video face swapping: on FaceForensics++, identity retrieval is 99.20% (vs. ∼98.7% prior SOTA), mouth-L2 error is 1.69px, and eye-L2 error is 0.16px. Ablation studies confirm the necessity of each 3D condition: exclusion of any degrades pose, background, or expression fidelity. Temporal consistency with plug-and-play layers raises frame consistency from 95.78% (without) to 99.02% (with), while warping error nearly doubles without temporal modeling. Removing either Face Former or ReferenceNet results in a 4–5% drop in identity similarity (Wang et al., 15 Jan 2025).

6. Limitations and Extensions

Geometry in Style is constrained to manifold meshes with moderate aspect ratios; topology modifications and collision/self-intersection handling are not supported. Overly strong stylization parameters may cause mesh collapse in thin geometries or fail to capture high-frequency details beyond surface normal bandwidth. Extensions may include coupling with segmentation for spatially localized stylization, differentiable collision/physics, support for topology change, or exploration of conformal and higher-order geometric priors (Dinh et al., 29 Mar 2025).

DynamicFace depends on high-quality 3DMM fits. Disentanglement errors in the 3D prior estimation propagate directly into condition quality. Potential extensions include learned or unsupervised segmentation priors for more granular region control, advanced attention/fusion mechanisms, and joint optimization of 3D priors with diffusion model weights (Wang et al., 15 Jan 2025).

7. Schematic Workflow Overview

Framework 3D Priors Stylization/Transfer Mechanism Identity Preservation Mechanism
Geometry in Style Per-vertex target normals dARAP (SVD+Poisson) + diffusion model visual loss Local rigidity + global smoothness via ARAP
DynamicFace 3DMM: identity, pose, expression, albedo Mixture-of-Guider, UNet fusion, FaceFormer/ReferenceNet Disentangled 3D conditions + facial feature tokens

Identity-style normalization with 3D priors has established itself as a principled approach for stylized deformation and attribute transfer, providing explicit disentanglement between style and object identity by leveraging differentiable geometric constraints and physically-founded parameterizations (Dinh et al., 29 Mar 2025, Wang et al., 15 Jan 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Identity-Style Normalization with 3D Priors.