TGRN: Texture-Guided Restoration for BFR
- The paper introduces TGRN, a neural architecture that decomposes, aligns, and fuses identity-preserving and texture-rich features using a U-Net backbone with texture attention.
- TGRN employs dynamic fusion blocks with MLP-based channel weighting and multi-loss optimization to balance photorealism and identity fidelity under severe degradations.
- The integration of TGRN in CodeFormer++ achieves state-of-the-art blind face restoration, as evidenced by improved metrics such as PSNR, SSIM, and FID on diverse datasets.
Texture-Guided Restoration Network (TGRN) is a neural architecture central to CodeFormer++ for blind face restoration (BFR), which targets the decomposition, alignment, and fusion of identity-preserving and texture-rich features. TGRN dynamically transfers semantically aligned generative textures onto an identity-faithful restoration backbone through a lightweight U-Net–based encoder–decoder, augmented by a texture attention mechanism and dynamic fusion modules. It is designed to address the visual-quality/identity-fidelity trade-off seen in generative BFR methods by adaptively merging identity-consistent but texture-poor outputs with texture-strong but identity-drifting priors, enabling high-fidelity face recovery under severe and unknown image degradations (Reddem et al., 6 Oct 2025).
1. Architectural Structure and Feature Fusion
TGRN operates on two main inputs: the identity-preserving restoration (the output of CodeFormer's controllable feature transform with full identity weight) and the aligned generative prior (the generative output warped via a dedicated deformable alignment module). Its core structure is a 3-level U-Net encoder–decoder, with each encoder stage augmented by a Texture Attention Module (TAM) and a dynamic fusion block.
At each encoder level , the U-Net processes , producing spatial-channel features . In parallel, passes through a CNN, residual blocks, and adaptive pooling to yield texture features .
The dynamic fusion block computes spatially averaged descriptors: These are concatenated and passed through a three-layer MLP to predict per-channel fusion weights , which then mix the encoder () and texture () features: The fused features are propagated through the decoder to reconstruct the final restoration .
2. Mathematical Formulation
TGRN’s process and adaptive fusion are encapsulated by three principal equations:
- Global average pooling for dynamic fusion:
- MLP-based channel weighting:
- Feature fusion by adaptive channel mixing:
Here, denotes elementwise multiplication broadcast per channel, and all computation is performed at each encoder depth.
3. Training Losses and Optimization
TGRN is optimized using a composite loss that ensures photorealism, identity preservation, and robust fusion of semantic texture. The overall objective is: where the weights . The components are:
- loss (reconstruction):
- Adversarial loss (softplus):
where is a learned discriminator.
- Identity loss (ArcFace backbone):
- Cosine-triplet loss (deep metric learning):
- Let (with a mask of facial regions), and
- the L2-normalized VGG features for , , and , respectively.
- Triplet loss:
with .
The combination of photometric, adversarial, identity, and novel triplet loss terms is crucial in balancing detail realism and identity fidelity during training (Reddem et al., 6 Oct 2025).
4. Pipeline Integration and Context in CodeFormer++
Within the CodeFormer++ pipeline, TGRN occupies a final fusion role after controllable feature extraction and deformable alignment:
The input is processed by CodeFormer’s controllable feature transform to obtain:
- (identity-preserving, low-texture) with feature-weight ,
- (texture-rich, permits identity drift) with .
- The Deformable Alignment Module (DAM) computes a flow that warps to , semantically aligning generative texture with the structure of .
- TGRN receives and deploys the U-Net backbone, texture attention, and dynamic fusion, yielding .
- Triplet metric learning, with anchor (hard negative) and (mask-wise blend as positive), enforces that absorbs texture selectively while maintaining identity congruence.
This stepwise arrangement enables CodeFormer++ to outperform approaches focusing solely on either identity or detail, by explicitly modeling and injecting generative priors into an aligned, identity-anchored backbone (Reddem et al., 6 Oct 2025).
5. Empirical Impact and Ablation
Quantitative experiments on both synthetic and real-world face datasets confirm the effectiveness of TGRN as a texture-injection and fusion module. Performance metrics include PSNR, SSIM, NIQE, LPIPS, FID, and LMD.
Table 1. CelebA-Test (synthetic)
| Methods | PSNR ↑ | SSIM ↑ | NIQE ↓ | LPIPS ↓ | FID ↓ | LMD ↓ |
|---|---|---|---|---|---|---|
| CodeFormer | 22.18 | 0.610 | 4.520 | 0.299 | 60.62 | 5.38 |
| VQFR | 24.14 | 0.636 | 3.693 | 0.351 | 41.28 | 9.13 |
| Ours | 24.96 | 0.697 | 4.052 | 0.341 | 38.13 | 5.41 |
Table 2. Real-world datasets (FID/NIQE)
| Dataset | GPEN FID | GFPGAN FID | CodeFormer FID | Ours FID | Ours NIQE |
|---|---|---|---|---|---|
| LFW-Test | 57.58 | 49.96 | 52.02 | 45.63 | 3.518 |
| WebPhoto | 81.77 | 87.35 | 78.87 | 72.91 | 3.822 |
| WIDER-Test | 46.99 | 39.73 | 39.06 | 35.21 | 3.482 |
Qualitatively, TGRN injects realistic mesoscopic skin textures (e.g., pores, fine wrinkles) into the CF-ID backbone, while preserving global face structure and identity, avoiding the excessive drift observed in purely generative priors.
Ablation studies demonstrate:
- DAM alone improves identity consistency but introduces artifacts,
- TGRN with only // reduces artifacts but lacks rich detail,
- Only the combined TGRN with novel triplet loss achieves optimal perceptual quality/identity fidelity trade-off.
6. Relationship to Other Texture-Guided Restoration Paradigms
While TGRN is specialized for BFR and operates in tandem with deformable registration and metric learning, its guiding principle—explicit, adaptive texture fusion—relates to the broader class of texture map–guided restoration networks. For instance, Fu et al. (Fu et al., 2020) propose the use of weak texture information maps derived from Sobel edge differences to aid super-resolution networks. The maps are predicted using a dedicated auxiliary network and injected via a fusion module to the main RCAN super-resolver, resulting in enhanced recovery of faint textures in SISR contexts.
A plausible implication is that while the weak-texture methods operate at the level of gradient maps and are relatively generic, TGRN’s fusion paradigm operates semantically (via deep, aligned features) and is tightly coupled with deep metric learning for identity-sensitive fusion—characteristics demanded by the BFR setting.
7. Significance and Prospects
TGRN provides a parameter-efficient solution to the dual requirements of detailed visual realism and faithful identity restoration. Through dynamic channelwise fusion and selective metric supervision, it establishes a new state-of-the-art in BFR tasks as evidenced by both quantitative metrics and expert-assessed perceptual quality.
An open direction suggested by the current design is the adaptation of TGRN-style dynamic fusion to other restoration settings, possibly substituting manually engineered texture maps (e.g., Sobel gradients) with deep feature–aligned texture modules for tasks such as general image restoration or SISR, extending the scope demonstrated by (Fu et al., 2020). Further research may explore more lightweight fusion architectures, self-supervised alignment, or hybridization of learned and hand-crafted texture priors.
References:
- "CodeFormer++: Blind Face Restoration Using Deformable Registration and Deep Metric Learning" (Reddem et al., 6 Oct 2025)
- "Weak Texture Information Map Guided Image Super-resolution with Deep Residual Networks" (Fu et al., 2020)