VeilGen: Biometric & Glare Synthesis Frameworks
- VeilGen is a dual-framework approach encompassing peri-ocular biometric recognition with deep-feature analysis and physics-informed veiling glare synthesis and removal.
- The first framework employs a VGG-19 based feature extractor with PCA reduction and diverse classifiers to achieve near-perfect identification, gender, age, and expression recognition under occlusion.
- The second framework integrates a Stable Diffusion backbone with novel modules to simulate and invert optical degradations, outperforming existing deblurring and dehazing techniques.
VeilGen denotes two distinct, technically unrelated frameworks in the computer vision literature: (1) a deep-feature-based identification system for recognizing veiled individuals on images restricted to the peri-ocular region (Hassanat et al., 2021), and (2) a physics-informed generative model for veiling glare synthesis and removal in lens-degraded imagery (Qian et al., 21 Nov 2025). Each system is foundational in its respective subfield, and both adopt the name "VeilGen" to emphasize the inference or simulation of salient information in the presence of partial occlusion or optical degradation.
1. Peri-Ocular Recognition for Veiled Persons
1.1 Dataset and Problem Definition
VeilGen (Hassanat et al., 2021) addresses biometric recognition in scenarios where only the peri-ocular region is visible due to full-face veiling (e.g., Niqab). The task suite comprises identity (150-way), gender (2-way), age (4-way: Children <18, Youth 19–30, Adults 31–50, Elderly ≥51), and “eye-smile” expression (2-way) recognition. Experiments utilize the VPI-New dataset:
| Subjects | Images per Subject | Total Images | Sessions | Gender Distribution | Age Range |
|---|---|---|---|---|---|
| 150 | 14 | 2100 | 2 × 7 | 41M/109F | 8–78 |
Acquisition was performed with a 13MP smartphone camera under uncontrolled (office) indoor conditions at distances of 30–50 cm, minor pose variation, and veils in black/white. Labeling encodes all annotation in the filename, including session, ID, gender, age, image index, and expression.
1.2 Deep Feature Extraction Pipeline
VeilGen applies the unmodified VGG-19 network (pretrained on ImageNet) as a fixed feature extractor. Each image undergoes:
- Conversion to RGB (if grayscale), and resizing to 224 × 224.
- Extraction of the 4096-dimensional activations from fully connected layers FC6 and FC7.
- Optional coordinate-wise merging of FC6 and FC7 (min, max, mean).
- Application of PCA for dimensionality reduction, with variance retained, producing typical feature dimensionalities (99%), $208$ (97%), $137$ (95%).
- Vector normalization centered on the training fold mean.
1.3 Classification Methodology
Classification leverages WEKA with 10-fold cross-validation stratified by class and no dedicated test partition. The evaluated models include k-Nearest Neighbors (k = 1, 3, 5), Random Forest (100 trees), Naïve Bayes (Gaussian), BayesNet (heuristic Bayesian structure), and a single-layer feedforward Neural Network (cross-entropy softmax).
Classification is performed on the PCA-reduced feature vectors for each task. No regularization or augmentation is introduced post-feature extraction.
1.4 Performance Results
VeilGen establishes state-of-the-art results for fully veiled identity, gender, age, and expression recognition:
| Task | Best Configuration | Accuracy / F1 / AUC | Notable Results |
|---|---|---|---|
| Identification | ANN on FC6 PCA95% | 99.95% | ±0.05% std |
| Gender | 3NN on FC6 PCA97% | 99.91% | Sens. 99.94%, Spec. 99.82% |
| Age | 1NN on FC6 PCA95% | 100.00% | Perf. conf. matrix |
| Eye-smile | ANN on FC6 PCA97% | 80.0% (F1 ≈ 0.76) | ROC area ≈ 0.77 |
VeilGen outperforms prior hand-crafted feature approaches on the VPI dataset for identity (99.95% vs. 97.22%) and gender (99.9% vs. 99.41%).
1.5 Limitations and Extensions
Current limitations of VeilGen include constrained acquisition conditions (indoor, controlled background, near-frontal), lack of network fine-tuning (feature extractor is not updated), and relatively modest expression recognition performance. The framework does not incorporate “in-the-wild” data, and future directions include large-scale veiled-face corpus collection, more robust architectures (e.g., ResNet or ArcFace), and multi-task learning to optimize all recognition axes jointly.
2. Generative Modeling of Veiling Glare in Compact Optics
2.1 Physical Model and Task Definition
VeilGen (Qian et al., 21 Nov 2025) targets image degradation in compact lenses—particularly veiling glare arising from stray-light scattering in non-ideal optics—which leads to spatially varying, depth-independent image degradation. The challenge is twofold: classical dehazing models are ill-suited due to non-depth-dependent scattering, and high-quality paired data for supervised restoration are typically unavailable.
The underlying image formation model is:
for each patch and color channel, where is the local PSF, local transmission (contrast loss), additive glare, and the clean reference patch.
2.2 VeilGen Architecture
VeilGen leverages a Stable Diffusion (SD-v2.1) backbone with several novel modules:
- IRControlNet: Guides the main U-Net-based denoiser with aberration conditioning.
- Latent Optical Transmission and Glare Map Predictor (LOTGMP): Infers latent per-pixel transmission () and glare maps () by processing both noisy latents and target domain encodings, using a shallow, time-embedded convolutional network.
- Veiling Glare Imposition Module (VGIM): Physically imposes (scaling) and (addition) at multiple skip levels in the U-Net, forming a differentiable approximation of physical scattering.
Sampling is performed via a DDPM-style process, where for each diffusion step, the denoiser blends predictions conditioned on clean (aberration-only) and compound (aberration + glare) maps, controlled by a fixed mixture coefficient .
2.3 Unsupervised Physics-Informed Training
VeilGen distinguishes two training domains:
- Source (): Paired with only aberration, conditioned on neutral transmission/glare maps.
- Target (): Unpaired, with compound degradation; LOTGMP infers the physical maps.
The total generation loss is
with . Each component is an loss in the denoiser’s latent feature space, with Stable Diffusion’s intrinsic priors regularizing the result. This hybrid approach enables unsupervised learning of physically realistic degradation statistics from real-world compact lens data.
2.4 Restoration via DeVeiler Network
The companion DeVeiler restoration network includes:
- Encoder–Decoder U-Net structure, with a bottleneck of SwinIR RSTB layers for long-range spatial modeling.
- Veiling Glare Compensation Module (VGCM): Performs the inverse transform of VGIM during feature decoding, leveraging predicted maps to demodulate and restore features.
- Distilled Degradation Network (DDN): A shallow CNN approximating the forward VeilGen scattering model, included for the reversibility (physics) constraint loss.
The loss for restoration combines an image term, a perceptual LPIPS term, and a reversibility term penalizing mismatch between DDN-applied degradation on the restored image and the original degraded observation.
2.5 Experimental Evaluation
VeilGen and DeVeiler are evaluated on data from large-aperture Single Lens (SL) and Metasurface-Refractive Lens (MRL) systems:
| Method | PSNR (Screen-SL) | SSIM | LPIPS |
|---|---|---|---|
| SwinIR (aberration) | 18.18 | 0.686 | 0.298 |
| SwinIR + DiffDehaze | 19.31 | 0.642 | 0.347 |
| QDMR (domain adaptation) | 18.45 | 0.681 | 0.291 |
| DeVeiler (VeilGen) | 22.38 | 0.729 | 0.261 |
No-reference metrics (Realworld-SL): CLIPIQA=0.607, Q-Align=3.987, NIQE=4.448 (all best among tested baselines). Qualitative analysis confirms contrast recovery, color saturation restoration, and fine texture preservation.
Ablation studies show the necessity of both LOTGMP and the SD prior, with performance drops (0.74 dB PSNR) on their removal. Traditional CycleGAN and haze-based generation perform markedly worse in paired data synthesis (0.74 dB PSNR for VeilGen).
2.6 Network Implementation and Mathematical Details
Key architectural properties:
- LOTGMP: Two 3×3 convolutional layers (ReLU), time embedding MLP, two output heads (transmission, glare).
- VGIM/VGCM: At each skip level, features are demodulated as (VGIM) and (VGCM).
- DDN: Five-layer CNN for direct pixelwise forward degradation.
- Optimization: AdamW or Adam optimizers, progressive learning rate decay, batch sizes 8–16; in restoration loss.
3. Applications and Results
- Peri-Ocular Biometrics: VeilGen (Hassanat et al., 2021) advances peri-ocular recognition under extreme occlusion, demonstrating near-perfect performance in identity, gender, and age classification within the studied protocol.
- Optical Deblurring and Dehazing: VeilGen (Qian et al., 21 Nov 2025) provides the first generative framework capable of simulating and inverting realistic compound degradations (optical aberration plus veiling glare) in compact lens systems, supporting both dataset generation and interpretability.
4. Comparative Analysis and Advantages
Compared to prior work:
- Biometrics: VeilGen substantially exceeds hand-crafted feature systems for person identification and matches or surpasses gender and age baselines, while being uniquely evaluated on simultaneous expression recognition.
- Veiling Glare Synthesis: CycleGAN and conventional haze-based data generation yield inferior synthetic pairs and degraded restoration when compared via full-reference and no-reference image quality metrics.
- Restoration Quality: The bidirectional guidance of intermediate feature modulation by transmission/glare maps uniquely enables DeVeiler to outperform both blind and naive cascaded baselines.
5. Limitations and Future Directions
5.1 VeilGen Biometrics
- Current data are “in the laboratory”—collected indoors, under controlled settings and near-frontal poses. This suggests diminished generalization to unconstrained “in-the-wild” cases with harsher lighting, more significant occlusion, or resolution variation.
- The feature extractor is off-the-shelf and unfinetuned; adaptation or end-to-end training (e.g., via siamese or metric learning) could enhance robustness.
- Eye-smile recognition is well below the other tasks (≈80% accuracy), potentially remedied by domain-specific peri-ocular expression networks or augmentation.
5.2 VeilGen Glare Synthesis
- The current framework is validated primarily on large-aperture and metasurface-refractive lenses in controlled and real-world screens.
- Data synthesis via VeilGen is dependent on the distribution of target degraded images for LOTGMP learning; collection of broader, more diverse compound degraded samples may further improve generalization.
- Extension to additional physical degradations (beyond transmission and additive glare) and to end-to-end task-specific inference (e.g., jointly with downstream recognition) is proposed.
6. Broader Context and Significance
- VeilGen (Hassanat et al., 2021) establishes a reproducible, feature-centric pipeline for challenging veiled biometrics, providing a baseline for future “privacy-by-occlusion” and partial-face recognition research.
- VeilGen (Qian et al., 21 Nov 2025) introduces a physically grounded, diffusion-based paradigm for simulating and inverting optical degradations, bridging generative modeling with physical interpretability. The explicit modeling of latent transmission and glare maps, regularized by strong diffusion priors, establishes a blueprint for future vision systems targeting physically realistic compound image degradation and restoration.
Both frameworks are open-sourced with code and datasets to promulgate further paper in their respective domains.