Soft Shadow Diffusion (SSD)
- Soft Shadow Diffusion (SSD) is a hybrid framework that combines explicit radiometric light transport models with neural diffusion methods to simulate soft shadow phenomena.
- The approach formulates a separable nonlinear least squares problem and leverages gradient-based optimization to efficiently recover occlusion parameters from indirect cues.
- SSD is applied in shadow removal and portrait relighting, achieving performance improvements such as RMSE reductions of ~14% and PSNR gains of 5 dB over earlier methods.
Soft Shadow Diffusion (SSD) refers to a set of physics-inspired and learning-based methodologies that address the problem of generating, reconstructing, or removing soft shadows in images under challenging scene, lighting, or viewpoint constraints. Distinct approaches have emerged in computational periscopy, image compositing, portrait enhancement, and shadow removal, unified by the adoption of differentiable light transport models and diffusion architectures. SSD combines rigorous physical modeling of occlusion and radiometry with learned neural priors, enabling precise reconstruction or synthesis of soft, ambiguous shadow phenomena from minimal or highly indirect cues.
1. Physical Modeling of Soft Shadows and Forward Light Transport
Fundamental to SSD is an explicit radiometric light transport model for non-line-of-sight (NLOS) scenarios, where a visible relay surface encodes indirect information about a hidden scene. Wall irradiance at pixel in the relay surface is formulated as:
where quantifies hidden-scene radiosity, encodes Lambertian foreshortening based on surface normals, is the hard or soft binary occlusion function parameterized by hidden-shape variables , and models ambient background (Raji et al., 18 Jan 2026). Discretization yields an observed measurement vector linked linearly and nonlinearly to the unknowns:
This formulation generalizes beyond binary shadow casting to accommodate penumbrae, volumetric occlusions, and partial visibility in complex geometries.
2. Separable Inverse Problem and Gradient-Based Optimization
The conversion of the forward model to a separable nonlinear least squares (SNLLS) inverse problem enables efficient computational periscopy. Here, the problem is decomposed:
The separation permits closed-form linear Tikhonov solutions for given , with nonlinear optimization reserved for occluder parameters (Raji et al., 18 Jan 2026). Sigmoid-relaxed proxies and alternating minimization algorithms further facilitate gradient-based recovery of voxel occupancies or point-cloud structures. This approach allows incorporation of continuous relaxations and visibility masks for computational tractability, though memory scaling can become prohibitive for extremely dense or high-resolution reconstructions.
3. Neural Diffusion Approaches and Conditional Generative Priors
SSD introduces conditional diffusion models as generative priors for 3D occluder or shadow synthesis. In the computational periscopy setting, SSD learns a denoising network over point clouds:
trained to remove Gaussian noise at diffusion step , conditioned on an embedding extracted from the observed shadow image via a transformer encoder. The forward and reverse processes are modeled as:
where the objective is to minimize the noise prediction discrepancy (Raji et al., 18 Jan 2026). Conditioning with physical latent representations ensures that generated shapes are consistent with the scene's measured shadow, integrating physics and learned geometry.
In the context of 2D shadow generation for compositing, SSD leverages Rectified Flow: a one-step ODE-based transport from a Gaussian reference toward the data distribution in latent VAE space:
Single-step denoising and light-parameter embeddings allow real-time, controllable synthesis of shadow maps respecting object boundaries and lighting directionality (Tasar et al., 2024). The approach generalizes well from synthetic Blender-rendered data to photographs without explicit background or geometry modeling.
4. Applications in Soft Shadow Removal and Relighting
SSD methods have seen direct application in shadow removal, portrait relighting, and scene compositing. DeS3 leverages an adaptive attention-driven U-Net diffusion backbone and ViT similarity loss to remove hard, soft, and self-shadows from noisy input images (Jin et al., 2022). The model dispenses with binary shadow masks, dynamically injecting attention maps and penalizing deviations in transformer-based self-similarity spectra to preserve global image structure during local corrections. Quantitative improvements are reported over prior paired-region and matting methods, with RMSE reductions of ~14% and PSNR gains of 5 dB (e.g., RMSE=3.01, PSNR=33.95 on LRSS dataset).
For portrait enhancement, Soft-Shadow Diffusion models softening of shadows and specular highlights in real or synthetic light-stage datasets. The procedure learns to virtually "scrim" a portrait by convolving lighting environments with normalized Phong kernels, parameterized by a perceptual diffuseness control variable derived from the Gini coefficient of the incident environment map (Futschik et al., 2023). Augmentation steps simulate external occlusions and subsurface scattering effects, enabling robust inference even in the presence of realistic occlusions and facial diversity. Downstream vision tasks such as geometry estimation and semantic segmentation benefit from the diffusion of artifact shadows.
5. Training, Benchmarking, and Performance Evaluation
SSD models are supervised on synthetic datasets rendered to cover large shape, lighting, and softness parameter spaces. For NLOS reconstruction, 260 K ShapeNet-derived pairs link point-cloud shapes to soft shadow images, processed through transformer and U-Net backbones (Raji et al., 18 Jan 2026). In controllable shadow generation, 257,612 triplets of object images, masks, and shadow maps over 9,872 meshes constitute the training corpus; benchmarks address softness, horizontal, and vertical light controls (Tasar et al., 2024).
Evaluation metrics include Chamfer distance (), MSE (), RMSE, soft IoU (), and ZNCC (), with robustness demonstrated to noise (SNR as low as 10 dB), ambient light (SBR down to 0 dB), or cross-domain generalization. SSD methods surpass gradient-based baselines under high background, outperform prior diffusion methods for shadow removal, and enable real-time inference with single-step latent denoising.
6. Limitations, Open Problems, and Future Directions
Several limitations are inherent: memory complexity in gradient baselines (∼), pose localization for 3D reconstructions, CFOV estimation, exclusive handling of isolated objects in compositing, and restricted light models (e.g., area lights without complex BRDFs) (Raji et al., 18 Jan 2026, Tasar et al., 2024). SSD does not currently address occlusions by background objects in general scenes, nor multi-occluder configurations or volumetric scattering.
Future work is identified in joint shape-pose learning within the diffusion loop, multi-object and scene-wide shadow generation via depth/normal coupling, few-step refinement for enhanced fidelity, and self-supervised extension to real-image domains. Application spaces include passive NLOS imaging in robotics and search-and-rescue, biomedical endoscopy, portrait retouching, semantic vision, and physically-consistent image compositing.
SSD unites physically-grounded inverse modeling and advanced neural generation, marking the first single-shot, passive 3D periscopic reconstruction from penumbra observation robust to real-world noise and ambient transients (Raji et al., 18 Jan 2026). The developments in controllability and real-time efficiency position SSD as a foundation for next-generation imaging and vision systems.