Augmented Shadow Face in the Wild (ASFW)

Updated 3 February 2026

Augmented Shadow Face in the Wild (ASFW) is a real-world benchmark with 1,081 paired images offering accurate shadow and shadow-free comparisons for facial restoration.
It employs a bidirectional, Photoshop-based four-stage workflow to synthesize realistic shadows, bridging the gap between synthetic and real-world data.
The accompanying FSE framework, a three-stage neural network, delivers state-of-the-art results with notable improvements in PSNR and SSIM metrics.

Augmented Shadow Face in the Wild (ASFW) constitutes the first large-scale real-world benchmark for facial shadow removal, comprising 1,081 paired images of real faces—each pair consisting of a shadowed and a pixel-aligned shadow-free version. Constructed via a professional and bidirectional manual process in Adobe Photoshop, ASFW exhibits diverse and photorealistic shadow types as well as accurate ground truth for shadow removal tasks. The dataset bridges the synthetic–real domain gap found in previous benchmarks and enables rigorous evaluation of facial shadow removal algorithms. Its utility is demonstrated through the introduction of the Face Shadow Eraser (FSE), a multi-stage neural framework that attains state-of-the-art results in both quantitative and qualitative assessment on challenging real-world conditions (Luo et al., 27 Jan 2026).

1. Dataset Composition and Construction

1.1 Paired Samples and Usage

ASFW comprises 1,081 real-world facial image pairs, each containing a shadowed photo and its meticulously aligned shadow-free counterpart. In comparative experiments, ASFW is used solely as a single, held-out test split—no explicit train/val/test partitions are defined within the dataset, emphasizing its role as an evaluation benchmark.

1.2 Photoshop-Based Four-Stage Bidirectional Workflow

ASFW is generated using a manual, bidirectional, four-stage pipeline in Adobe Photoshop:

Shadow Synthesis: Artificial shadows are added to originally shadow-free images, with controlled brush flow (10–30%) and opacity (15–45%) settings to accurately mimic the softness and transitions of real facial shadows. Shadows are mapped in accordance with three-dimensional facial landmarks (e.g., nasal bridge, orbital areas, zygomatic arches). Shadow edges are rendered using dual diameters (hard: 5–15 px, soft: 25–50 px) with pressure-sensitive opacity, and diversity is further introduced by simulating occlusions from hair, hats, hands, and micro-shadows from skin details (wrinkles, pores).
Shadow Removal: The shadow-free counterparts are created via lasso-based segmentation with adaptive feathering, followed by local brightness and color correction with feathered masks to eliminate halos. Edge artifacts are addressed using the Spot Healing Brush with content-aware sampling, and skin texture is restored using Content-Aware Fill, Clone Stamp, and Mixer Brush.

1.3 Shadow Diversity and Image Attributes

The dataset encompasses a wide array of shadow phenomena, including hard versus soft edges, various occlusions (hair, hats, hands), micro-shadows from skin texture, and illumination from differing angles (frontal, side, top light). The empirical distribution of shadow types can be denoted as

$P_\text{shadow} = \{p_\text{hard}, p_\text{soft}, p_\text{occlusion}, p_\text{micro}\},$

though explicit statistics regarding their proportions are not reported. Image identities span a broad diversity of age, gender, skin tone, pose, and lighting, resulting in a challenging and realistic corpus for algorithm evaluation.

2. Face Shadow Eraser (FSE) Framework

2.1 Three-Stage Architecture

The FSE is a cascaded, lightweight, three-stage deep network for shadow removal:

MaskGuideNet: Generates a soft shadow probability map.
CoarseGenNet: Produces a coarse, shadow-free facial image.
RefineFaceNet: Refines structural and photometric details, correcting fine textures and illumination.

The respective mapping from an input image $I$ (optionally with initial mask $M$ ) to reconstructed shadow-free image $R$ is:

$R = \mathcal{F}_\text{refine} \circ \mathcal{F}_\text{coarse} \circ \mathcal{F}_\text{mask}(I \oplus M)$

where “ $\oplus$ ” denotes channel-wise concatenation and “ $\circ$ ” denotes functional composition.

2.2 MaskGuideNet: Soft Shadow Map Generation

Input: 4-channel tensor $[I_{rgb}, M_\text{init}] \in \mathbb{R}^{H \times W \times 4}$ .
Architecture: Encoder–decoder network with Conv+ReLU residual blocks.
Output: Soft shadow probability map $M' \in [0,1]^{H \times W}$ via a channel-wise sigmoid:

$M' = \sigma(\mathcal{ED}(E(I \oplus M_\text{init})))$

2.3 CoarseGenNet: Coarse Shadow Removal

Input: Concatenated $[I, M']$ .
Architecture: Initial 3×3 Conv+ReLU for feature extraction, four AggBlock modules executing dynamic convolutions across dilation rates (e.g., dilation = {1, 2, 3}), final 3×3 Conv to produce coarse output $C$ .

$C = \mathcal{F}_\text{coarse}(I, M')$

Purpose: Refines residual artifacts, leveraging both global-local context (Swin-Transformer inspired) and mask-conditioned feature modulation.
Core Components:
- Adaptive Hierarchical Shift-Window Attention (AHSWA): Alternates regular/shifted windows, scaled dot-product attention, and depthwise convolutions.
- Illumination Refinement Component (IRC): Applies $1 \times 1$ mask-conditioned convolutional scale ( $\gamma(M')$ ) and bias ( $\beta(M')$ ) to enhance domain adaptation.

The final combine step:

$R = C + \text{AHSWA}(C) \odot \text{IRC}(C \oplus M')$

where “ $\odot$ ” is element-wise multiplication.

3. Training Objectives and Loss Functions

FSE is trained to minimize a weighted sum of three losses, without adversarial objectives or perceptual (“VGG-feature”)-based losses:

$\mathcal{L}_\text{total} = \mathcal{L}_\text{MSE} + \lambda_1 \mathcal{L}_\text{SSIM} + \lambda_2 \mathcal{L}_\text{LPIPS}$

with $\lambda_1 = 0.2$ , $\lambda_2 = 0.2$ . Definitions:

$\mathcal{L}_\text{MSE}$ : pixelwise mean squared error,
$\mathcal{L}_\text{SSIM}$ : $1-\text{SSIM}(R, I^{gt})$ , structural similarity index loss,
$\mathcal{L}_\text{LPIPS}$ : learned perceptual image patch similarity.

The sum targets both perceptual and pixel-level fidelity, promoting high-fidelity texture preservation and accurate shadow removal.

4. Benchmarking: Experimental Results and Ablations

4.1 Quantitative Evaluation

Evaluations were performed on the held-out ASFW set (1,081 pairs) and the smaller UCB dataset (100 pairs), using PSNR (↑), SSIM (↑), MSE (↓), and LPIPS (↓). A summary of results on ASFW:

Method	PSNR↑	SSIM↑	MSE↓	LPIPS↓
BMNet [Zhu et al. 2022]	23.65	0.927	0.009	0.069
FSE + ASFW	25.45	0.930	0.006	0.066

This represents a +1.8 dB PSNR improvement over the highest-performing prior system, indicating the increased challenge and value of ASFW as a benchmark.

4.2 Ablation Analysis

Configuration	MaskGuide	CoarseGen	RefineFace	PSNR↑	SSIM↑	MSE↓	LPIPS↓
Full	✓	✓	✓	25.45	0.930	0.006	0.066
– MaskGuide	✗	✓	✓	22.47	0.905	0.009	0.086
– CoarseGen	✓	✗	✓	21.48	0.864	0.010	0.126
– RefineFace	✓	✓	✗	22.93	0.886	0.008	0.112

The incremental performance drops from omitting any single module underscore the essential contributions of all three components, with the coarse-to-fine progression and the RefineFaceNet proving particularly necessary for high-fidelity output.

4.3 Qualitative Assessment

Visual inspection on ASFW and UCB demonstrates that FSE excels in removing strong, real-world facial shadows (e.g., across cheeks and under brows) while maintaining photorealistic detail, including pores, precise skin tone transitions, and individual hair strands. FSE outperforms alternatives (e.g., Lyu et al., FSRNet, CIRNet) consistently across diverse cases.

4.4 User Studies and Downstream Vision Tasks

No user studies or downstream vision task evaluations are reported. The primary focus is on photorealistic, high-fidelity shadow removal as measured by restoration quality.

5. Significance and Impact in Shadow Removal Research

By introducing the first large-scale, photorealistic, real-world paired facial shadow benchmark (ASFW) and a modular, lightweight, and effective neural architecture (FSE), this work sets new standards for both data resources and algorithmic solutions in high-fidelity shadow removal. ASFW enables challenging, attribute-rich evaluation previously infeasible with synthetic or small-scale datasets. The bridging of domain gaps and methodological rigor addresses longstanding deficits in both data quality and algorithm performance, facilitating progress toward production-ready facial restoration systems (Luo et al., 27 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Beyond Shadows: A Large-Scale Benchmark and Multi-Stage Framework for High-Fidelity Facial Shadow Removal (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Augmented Shadow Face in the Wild (ASFW).

Augmented Shadow Face in the Wild (ASFW)

1. Dataset Composition and Construction

1.1 Paired Samples and Usage

1.2 Photoshop-Based Four-Stage Bidirectional Workflow

1.3 Shadow Diversity and Image Attributes

2. Face Shadow Eraser (FSE) Framework

2.1 Three-Stage Architecture

2.2 MaskGuideNet: Soft Shadow Map Generation

2.3 CoarseGenNet: Coarse Shadow Removal

2.4 RefineFaceNet: Structural and Photometric Refinement

3. Training Objectives and Loss Functions

4. Benchmarking: Experimental Results and Ablations

4.1 Quantitative Evaluation

4.2 Ablation Analysis

4.3 Qualitative Assessment

4.4 User Studies and Downstream Vision Tasks

5. Significance and Impact in Shadow Removal Research

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Augmented Shadow Face in the Wild (ASFW)

1. Dataset Composition and Construction

1.1 Paired Samples and Usage

1.2 Photoshop-Based Four-Stage Bidirectional Workflow

1.3 Shadow Diversity and Image Attributes

2. Face Shadow Eraser (FSE) Framework

2.1 Three-Stage Architecture

2.2 MaskGuideNet: Soft Shadow Map Generation

2.3 CoarseGenNet: Coarse Shadow Removal

2.4 RefineFaceNet: Structural and Photometric Refinement

3. Training Objectives and Loss Functions

4. Benchmarking: Experimental Results and Ablations

4.1 Quantitative Evaluation

4.2 Ablation Analysis

4.3 Qualitative Assessment

4.4 User Studies and Downstream Vision Tasks

5. Significance and Impact in Shadow Removal Research

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics