Papers
Topics
Authors
Recent
Search
2000 character limit reached

TGRN: Texture-Guided Restoration for BFR

Updated 24 March 2026
  • The paper introduces TGRN, a neural architecture that decomposes, aligns, and fuses identity-preserving and texture-rich features using a U-Net backbone with texture attention.
  • TGRN employs dynamic fusion blocks with MLP-based channel weighting and multi-loss optimization to balance photorealism and identity fidelity under severe degradations.
  • The integration of TGRN in CodeFormer++ achieves state-of-the-art blind face restoration, as evidenced by improved metrics such as PSNR, SSIM, and FID on diverse datasets.

Texture-Guided Restoration Network (TGRN) is a neural architecture central to CodeFormer++ for blind face restoration (BFR), which targets the decomposition, alignment, and fusion of identity-preserving and texture-rich features. TGRN dynamically transfers semantically aligned generative textures onto an identity-faithful restoration backbone through a lightweight U-Net–based encoder–decoder, augmented by a texture attention mechanism and dynamic fusion modules. It is designed to address the visual-quality/identity-fidelity trade-off seen in generative BFR methods by adaptively merging identity-consistent but texture-poor outputs with texture-strong but identity-drifting priors, enabling high-fidelity face recovery under severe and unknown image degradations (Reddem et al., 6 Oct 2025).

1. Architectural Structure and Feature Fusion

TGRN operates on two main inputs: the identity-preserving restoration IFI_F (the output of CodeFormer's controllable feature transform with full identity weight) and the aligned generative prior IwarpI_{\text{warp}} (the generative output warped via a dedicated deformable alignment module). Its core structure is a 3-level U-Net encoder–decoder, with each encoder stage augmented by a Texture Attention Module (TAM) and a dynamic fusion block.

At each encoder level ii, the U-Net processes IFI_F, producing spatial-channel features ZeiRm×n×dZ^i_e \in \mathbb{R}^{m\times n \times d}. In parallel, IwarpI_{\text{warp}} passes through a CNN, residual blocks, and adaptive pooling to yield texture features ZtiRm×n×dZ^i_t \in \mathbb{R}^{m\times n \times d}.

The dynamic fusion block computes spatially averaged descriptors: vei=1mns,tZei(s,t),vti=1mns,tZti(s,t)v^i_e = \frac{1}{mn}\sum_{s,t} Z^i_e(s,t), \qquad v^i_t = \frac{1}{mn}\sum_{s,t} Z^i_t(s,t) These are concatenated and passed through a three-layer MLP to predict per-channel fusion weights (wei,wti)Rd(w^i_e, w^i_t)\in\mathbb{R}^d, which then mix the encoder (ZeiZ^i_e) and texture (ZtiZ^i_t) features: Zmi=weiZei+wtiZtiZ^i_m = w^i_e \odot Z^i_e + w^i_t \odot Z^i_t The fused features {Zmi}i=13\{Z^i_m\}_{i=1}^3 are propagated through the decoder to reconstruct the final restoration IoutI_{\mathrm{out}}.

2. Mathematical Formulation

TGRN’s process and adaptive fusion are encapsulated by three principal equations:

  1. Global average pooling for dynamic fusion:

vei=1mns,tZei(s,t),vti=1mns,tZti(s,t)v^i_e = \frac{1}{m\,n} \textstyle\sum_{s,t} Z^i_e(s,t), \quad v^i_t = \frac{1}{m\,n} \sum_{s,t} Z^i_t(s,t)

  1. MLP-based channel weighting:

[wei,  wti]=MLP([vei,vti]),wei,wtiRd[\,w^i_e,\;w^i_t\,] = \mathrm{MLP}([v^i_e, v^i_t]), \quad w^i_e, w^i_t \in \mathbb R^d

  1. Feature fusion by adaptive channel mixing:

Zmi=weiZei+wtiZtiZ^i_m = w^i_e \odot Z^i_e + w^i_t \odot Z^i_t

Here, \odot denotes elementwise multiplication broadcast per channel, and all computation is performed at each encoder depth.

3. Training Losses and Optimization

TGRN is optimized using a composite loss that ensures photorealism, identity preservation, and robust fusion of semantic texture. The overall objective is: Ltotal=λ1L1+λadvLadv+λidLid+LtripletL_{\text{total}} = \lambda_{1} L_1 + \lambda_{\text{adv}} L_{\text{adv}} + \lambda_{\text{id}} L_{\text{id}} + L_{\text{triplet}} where the weights (λ1,λadv,λid)=(0.1,0.1,10)(\lambda_1, \lambda_{\text{adv}}, \lambda_{\text{id}}) = (0.1, 0.1, 10). The components are:

  • L1L_1 loss (reconstruction):

L1=IHQIout1L_1 = \left\| I_{\text{HQ}} - I_{\text{out}} \right\|_1

  • Adversarial loss (softplus):

Ladv=EIout[softplus(D(Iout))]L_{\text{adv}} = -\mathbb{E}_{I_{\text{out}}} [\mathrm{softplus}(D(I_{\text{out}}))]

where DD is a learned discriminator.

  • Identity loss (ArcFace backbone):

Lid=η(IHQ)η(Iout)1L_{\text{id}} = \| \eta(I_{\text{HQ}}) - \eta(I_{\text{out}}) \|_1

  • Cosine-triplet loss (deep metric learning):
    • Let IAP=IFM+Iwarp(1M)I_{AP}=I_F \odot M + I_{\text{warp}} \odot (1-M) (with MM a mask of facial regions), and
    • fp,fa,fnf_p, f_a, f_n the L2-normalized VGG features for IAPI_{AP}, IoutI_{\text{out}}, and IFI_F, respectively.
    • Triplet loss:

    Ltriplet=λtripletlog(ecosθ+ecosθ++ecosθ),cosθ+=fpfa,    cosθ=fnfaL_{\mathrm{triplet}} = -\lambda_{\mathrm{triplet}} \log\left(\frac{e^{\cos\theta^+}}{e^{\cos\theta^+}+e^{\cos\theta^-}}\right),\quad \cos\theta^+ = f_p^\top f_a,\;\; \cos\theta^- = f_n^\top f_a

    with λtriplet=1\lambda_{\mathrm{triplet}} = 1.

The combination of photometric, adversarial, identity, and novel triplet loss terms is crucial in balancing detail realism and identity fidelity during training (Reddem et al., 6 Oct 2025).

4. Pipeline Integration and Context in CodeFormer++

Within the CodeFormer++ pipeline, TGRN occupies a final fusion role after controllable feature extraction and deformable alignment:

  1. The input ILQI_{\text{LQ}} is processed by CodeFormer’s controllable feature transform to obtain:

    • IFI_F (identity-preserving, low-texture) with feature-weight w=1w=1,
    • IGI_G (texture-rich, permits identity drift) with w=0w=0.
  2. The Deformable Alignment Module (DAM) computes a flow ϕ=Rθ(IF,IG)\phi = R_\theta(I_F, I_G) that warps IGI_G to IwarpI_{\text{warp}}, semantically aligning generative texture with the structure of IFI_F.
  3. TGRN receives (IF,Iwarp)(I_F, I_{\text{warp}}) and deploys the U-Net backbone, texture attention, and dynamic fusion, yielding IoutI_{\text{out}}.
  4. Triplet metric learning, with anchor IFI_F (hard negative) and IAPI_{AP} (mask-wise blend as positive), enforces that IoutI_{\text{out}} absorbs texture selectively while maintaining identity congruence.

This stepwise arrangement enables CodeFormer++ to outperform approaches focusing solely on either identity or detail, by explicitly modeling and injecting generative priors into an aligned, identity-anchored backbone (Reddem et al., 6 Oct 2025).

5. Empirical Impact and Ablation

Quantitative experiments on both synthetic and real-world face datasets confirm the effectiveness of TGRN as a texture-injection and fusion module. Performance metrics include PSNR, SSIM, NIQE, LPIPS, FID, and LMD.

Table 1. CelebA-Test (synthetic)

Methods PSNR ↑ SSIM ↑ NIQE ↓ LPIPS ↓ FID ↓ LMD ↓
CodeFormer 22.18 0.610 4.520 0.299 60.62 5.38
VQFR 24.14 0.636 3.693 0.351 41.28 9.13
Ours 24.96 0.697 4.052 0.341 38.13 5.41

Table 2. Real-world datasets (FID/NIQE)

Dataset GPEN FID GFPGAN FID CodeFormer FID Ours FID Ours NIQE
LFW-Test 57.58 49.96 52.02 45.63 3.518
WebPhoto 81.77 87.35 78.87 72.91 3.822
WIDER-Test 46.99 39.73 39.06 35.21 3.482

Qualitatively, TGRN injects realistic mesoscopic skin textures (e.g., pores, fine wrinkles) into the CF-ID backbone, while preserving global face structure and identity, avoiding the excessive drift observed in purely generative priors.

Ablation studies demonstrate:

  • DAM alone improves identity consistency but introduces artifacts,
  • TGRN with only L1L_1/LidL_{\text{id}}/LadvL_{\text{adv}} reduces artifacts but lacks rich detail,
  • Only the combined TGRN with novel triplet loss achieves optimal perceptual quality/identity fidelity trade-off.

6. Relationship to Other Texture-Guided Restoration Paradigms

While TGRN is specialized for BFR and operates in tandem with deformable registration and metric learning, its guiding principle—explicit, adaptive texture fusion—relates to the broader class of texture map–guided restoration networks. For instance, Fu et al. (Fu et al., 2020) propose the use of weak texture information maps derived from Sobel edge differences to aid super-resolution networks. The maps are predicted using a dedicated auxiliary network and injected via a fusion module to the main RCAN super-resolver, resulting in enhanced recovery of faint textures in SISR contexts.

A plausible implication is that while the weak-texture methods operate at the level of gradient maps and are relatively generic, TGRN’s fusion paradigm operates semantically (via deep, aligned features) and is tightly coupled with deep metric learning for identity-sensitive fusion—characteristics demanded by the BFR setting.

7. Significance and Prospects

TGRN provides a parameter-efficient solution to the dual requirements of detailed visual realism and faithful identity restoration. Through dynamic channelwise fusion and selective metric supervision, it establishes a new state-of-the-art in BFR tasks as evidenced by both quantitative metrics and expert-assessed perceptual quality.

An open direction suggested by the current design is the adaptation of TGRN-style dynamic fusion to other restoration settings, possibly substituting manually engineered texture maps (e.g., Sobel gradients) with deep feature–aligned texture modules for tasks such as general image restoration or SISR, extending the scope demonstrated by (Fu et al., 2020). Further research may explore more lightweight fusion architectures, self-supervised alignment, or hybridization of learned and hand-crafted texture priors.

References:

  • "CodeFormer++: Blind Face Restoration Using Deformable Registration and Deep Metric Learning" (Reddem et al., 6 Oct 2025)
  • "Weak Texture Information Map Guided Image Super-resolution with Deep Residual Networks" (Fu et al., 2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Texture-Guided Restoration Network (TGRN).