Nighttime Image Reflection Separation (NightIRS)
- The paper introduces NightIRS, a novel dataset and DMDNet that disentangle transmission and reflection layers in challenging nighttime images.
- DMDNet leverages depth priors, spatial ordering, and memory expert compensation to accurately separate image components even under low-light, noisy conditions.
- Experimental evaluations on NightIRS show state-of-the-art improvements with enhanced PSNR, SSIM, and LPIPS metrics compared to traditional methods.
Nighttime Image Reflection Separation (NightIRS) addresses the technical challenge of disentangling transmission (T) and reflection (R) layers in images captured under nighttime conditions, where low-light environments and contrast similarity between layers exacerbate layer confusion. The NightIRS paradigm is instantiated by the introduction of the NightIRS dataset and the Depth-Memory Decoupling Network (DMDNet), which synergistically leverage depth priors, spatial ordering, and memory-expert compensation to surmount the difficulties inherent in nighttime reflection separation (Fang et al., 1 Jan 2026).
1. Nighttime Image Reflection Separation (NightIRS) Dataset
The NightIRS dataset is constructed to facilitate and benchmark the separation task under nighttime scenarios. Data acquisition employs the Sony LYTIA-T808 camera, leveraging its high-sensitivity CMOS sensor and HDR capability to capture low-light details faithfully. Reflection artifacts are physically induced using glass and acrylic sheets (700 mm × 500 mm; thicknesses of 1, 3, 5, and 8 mm), with systematic variation in camera-to-glass distance (0.5–3 m) and viewing angle (±30°) to ensure geometric heterogeneity. Scene diversity encompasses urban streets with artificial illumination, indoor environments, and naturally low-light scenes with scattered luminance points.
The dataset comprises 1 000 triplets :
- : blended nighttime image,
- : ground-truth transmission (true scene behind glass),
- : ground-truth reflection component.
NightIRS-HR is the high-resolution subset (same cardinality), recorded at the full camera resolution (approximately 4000 × 3000 pixels). All triplets are reserved exclusively for evaluation; no model fine-tuning is performed on NightIRS data. Annotation is achieved with a fixed tripod and remote shutter for precise alignment:
- Capture without glass/acrylic,
- Insert sheet for capturing ,
- Point the camera at a black target with the sheet in place to isolate by masking out the distant background.
Manual segmentation is unnecessary due to this controlled protocol.
2. Depth-Memory Decoupling Network (DMDNet) Architecture
DMDNet comprises three principal branches:
- Encoding Branch: Extracts multi-scale features and using a Mutually-Gated Interactive Block (MuGI) over five scales.
- Depth Semantic Modulation Branch (DSBranch): Derives hierarchical depth features via a monocular depth estimator (MiDaS v3.1 Next-ViT-L) at scales .
- Decoding Branch: Fuses appearance and depth cues with a stack of Depth-Memory Decoupling Blocks (DMBlocks) to yield separated outputs and .
This architecture is tailored to address ambiguities in layer disentanglement caused by similar contrasts and nighttime noise.
3. Depth-Aware Scanning and State-Space Modulation
Depth-aware processing is central to DMDNet efficiency and robustness. Two complementary scanning schemes leverage the proximity (inverse depth) map :
- DA-RScan (Transmission): Segments into connected regions, ordered by area and scanned “near-to-far” (higher first), then reversed to propagate contextual information.
- DA-GScan (Reflection): Globally sorts pixels by (near-to-far), with an inverse pass to augment long-range context aggregation.
The Depth-Synergized State-Space Model (DS-SSM) extends the standard linear state-space update,
by depth-modulating and :
with derived from . Integration of clean, salient structures is expedited while ambiguous features are suppressed, mediated via depth guidance.
Spatial Positional Encoding (SPE) employs two-dimensional sine-cosine functions to preserve spatial locality post-reordering:
combining to form a -dimensional embedding for each pixel .
4. Memory Expert Compensation Mechanism
The Memory Expert Compensation Module (MECM) is constituted as a Mixture-of-Experts employing cross-image memory to facilitate layer-specific compensation:
- Expert Gate: Selects top experts from via learned routing.
- Each Expert is composed of:
- GPStream (Global Pattern): Pools to , computes similarity with memory bank , and generates attention-masked global outputs ; updates via residual addition dependent on batch-wise similarities.
- SCStream (Spatial Context): Convolves with memory kernels for similarity mapping, selects top- memory indices per pixel, and applies spatial compensation to produce .
- Task-specific convolutions fuse and ; expert outputs are gated and aggregated.
This design enables DMDNet to utilize historical knowledge for robust prediction, demonstrated as optimal when both GPStream and SCStream branches with , are employed.
5. Training, Optimization, and Evaluation
Loss functions comprise:
- Load loss (): Ensures balanced expert utilization.
- Memory matching loss (): Triplet and alignment-based, promotes memory feature correspondence.
- Appearance loss (): Summation of and VGG perceptual differences for and .
- Total loss: .
Optimization utilizes Adam (lr , batch size , 60 epochs) with learning rate stepped down at epochs 30 and 50. During training, random crops and horizontal flips are only performed on daytime data; depth priors are computed offline.
Evaluation employs PSNR (dB), SSIM, and LPIPS (AlexNet) metrics. On daytime public datasets, DMDNet obtains top results (PSNR , SSIM , LPIPS ). For NightIRS, DMDNet achieves the best PSNR for both layers: $25.24$ for and $28.37$ for , underscoring its nighttime adaptivity. Ablation studies verify the contributions of DSMamba components, expert design, depth model quality, and loss weighting. Wilcoxon signed-rank tests confirm statistically significant improvements ().
| Dataset/Layer | PSNR↑ | SSIM↑ | LPIPS↓ |
|---|---|---|---|
| NightIRS (T) | 25.24 | 0.832 | 0.144 |
| NightIRS (R) | 28.37 | 0.633 | 0.286 |
6. Limitations, Failure Cases, and Future Directions
Residual reflections may persist in regions where both and are extremely dark, owing to under-modulation in DS-SSM. Depth estimation errors, for example, specular glare construed as near surfaces, may disrupt scan priorities in DAScan. MECM’s generalization across out-of-distribution nighttime scenes is bounded by the scope of the supervised training memory.
Proposed future avenues include:
- Unsupervised/self-supervised learning to lessen dependency on synthetic triplets and accommodate unpaired real nighttime data.
- Joint depth and separation network optimization to enhance end-to-end performance.
- Integration of complementary cues, such as polarization or multi-frame temporal information, in a model-agnostic manner.
- Development of quantized or lightweight DS-SSM for deployment on edge devices, especially for security camera applications.
NightIRS and DMDNet collectively advance the frontier in single-image reflection separation, with the former providing a high-fidelity benchmark for nighttime conditions and the latter achieving state-of-the-art performance by integrating depth-aware scanning, depth-modulated state-space models, and memory expert compensation (Fang et al., 1 Jan 2026).