Papers
Topics
Authors
Recent
2000 character limit reached

Nighttime Image Reflection Separation (NightIRS)

Updated 8 January 2026
  • The paper introduces NightIRS, a novel dataset and DMDNet that disentangle transmission and reflection layers in challenging nighttime images.
  • DMDNet leverages depth priors, spatial ordering, and memory expert compensation to accurately separate image components even under low-light, noisy conditions.
  • Experimental evaluations on NightIRS show state-of-the-art improvements with enhanced PSNR, SSIM, and LPIPS metrics compared to traditional methods.

Nighttime Image Reflection Separation (NightIRS) addresses the technical challenge of disentangling transmission (T) and reflection (R) layers in images captured under nighttime conditions, where low-light environments and contrast similarity between layers exacerbate layer confusion. The NightIRS paradigm is instantiated by the introduction of the NightIRS dataset and the Depth-Memory Decoupling Network (DMDNet), which synergistically leverage depth priors, spatial ordering, and memory-expert compensation to surmount the difficulties inherent in nighttime reflection separation (Fang et al., 1 Jan 2026).

1. Nighttime Image Reflection Separation (NightIRS) Dataset

The NightIRS dataset is constructed to facilitate and benchmark the separation task under nighttime scenarios. Data acquisition employs the Sony LYTIA-T808 camera, leveraging its high-sensitivity CMOS sensor and HDR capability to capture low-light details faithfully. Reflection artifacts are physically induced using glass and acrylic sheets (700 mm × 500 mm; thicknesses of 1, 3, 5, and 8 mm), with systematic variation in camera-to-glass distance (0.5–3 m) and viewing angle (±30°) to ensure geometric heterogeneity. Scene diversity encompasses urban streets with artificial illumination, indoor environments, and naturally low-light scenes with scattered luminance points.

The dataset comprises 1 000 triplets {I,T,R}\{I, T, R\}:

  • IRH×W×3I \in \mathbb{R}^{H \times W \times 3}: blended nighttime image,
  • TT: ground-truth transmission (true scene behind glass),
  • RR: ground-truth reflection component.

NightIRS-HR is the high-resolution subset (same cardinality), recorded at the full camera resolution (approximately 4000 × 3000 pixels). All triplets are reserved exclusively for evaluation; no model fine-tuning is performed on NightIRS data. Annotation is achieved with a fixed tripod and remote shutter for precise alignment:

  1. Capture TT without glass/acrylic,
  2. Insert sheet for capturing II,
  3. Point the camera at a black target with the sheet in place to isolate RR by masking out the distant background.

Manual segmentation is unnecessary due to this controlled protocol.

2. Depth-Memory Decoupling Network (DMDNet) Architecture

DMDNet comprises three principal branches:

  • Encoding Branch: Extracts multi-scale features ETiE^i_T and ERiE^i_R using a Mutually-Gated Interactive Block (MuGI) over five scales.
  • Depth Semantic Modulation Branch (DSBranch): Derives hierarchical depth features DSiD^i_S via a monocular depth estimator (MiDaS v3.1 Next-ViT-L) at scales i=3,4,5i=3,4,5.
  • Decoding Branch: Fuses appearance and depth cues with a stack of Depth-Memory Decoupling Blocks (DMBlocks) to yield separated outputs T^\hat{T} and R^\hat{R}.

This architecture is tailored to address ambiguities in layer disentanglement caused by similar contrasts and nighttime noise.

3. Depth-Aware Scanning and State-Space Modulation

Depth-aware processing is central to DMDNet efficiency and robustness. Two complementary scanning schemes leverage the proximity (inverse depth) map PP:

  • DA-RScan (Transmission): Segments PP into connected regions, ordered by area and scanned “near-to-far” (higher PP first), then reversed to propagate contextual information.
  • DA-GScan (Reflection): Globally sorts pixels by PP (near-to-far), with an inverse pass to augment long-range context aggregation.

The Depth-Synergized State-Space Model (DS-SSM) extends the standard linear state-space update,

ht=Aht1+Bxt,yt=Cht+Dxth_t = A h_{t-1} + B x_t, \qquad y_t = C h_t + D x_t

by depth-modulating BB and CC:

Baware=(1γ)B+γBdepth,Caware=(1γ)C+γCdepthB_{\text{aware}} = (1 - \gamma) B + \gamma B_{\text{depth}}, \qquad C_{\text{aware}} = (1 - \gamma) C + \gamma C_{\text{depth}}

with γ[0,1]H×W\gamma \in [0,1]^{H \times W} derived from PP. Integration of clean, salient structures is expedited while ambiguous features are suppressed, mediated via depth guidance.

Spatial Positional Encoding (SPE) employs two-dimensional sine-cosine functions to preserve spatial locality post-reordering:

PEx(p)=[sin(pxfi),cos(pxfi)]i=1...d/4,PEy(p)=[sin(pyfi),cos(pyfi)]i=1...d/4PE_x(p) = [\sin(p_x f_i), \cos(p_x f_i)]_{i=1...d/4}, \quad PE_y(p) = [\sin(p_y f_i), \cos(p_y f_i)]_{i=1...d/4}

combining to form a dd-dimensional embedding for each pixel pp.

4. Memory Expert Compensation Mechanism

The Memory Expert Compensation Module (MECM) is constituted as a Mixture-of-Experts employing cross-image memory to facilitate layer-specific compensation:

  • Expert Gate: Selects top KK experts from NN via learned routing.
  • Each Expert is composed of:
    • GPStream (Global Pattern): Pools II to IGI_G, computes similarity with memory bank Mem\text{Mem}, and generates attention-masked global outputs OGO_G; updates Mem\text{Mem} via residual addition dependent on batch-wise similarities.
    • SCStream (Spatial Context): Convolves II with memory kernels for similarity mapping, selects top-KK memory indices per pixel, and applies spatial compensation to produce OSO_S.
  • Task-specific convolutions fuse OGO_G and OSO_S; expert outputs are gated and aggregated.

This design enables DMDNet to utilize historical knowledge for robust prediction, demonstrated as optimal when both GPStream and SCStream branches with NExp=4N_{\text{Exp}}=4, K=2K=2 are employed.

5. Training, Optimization, and Evaluation

Loss functions comprise:

  • Load loss (LloadL_{\text{load}}): Ensures balanced expert utilization.
  • Memory matching loss (LmemL_{\text{mem}}): Triplet and alignment-based, promotes memory feature correspondence.
  • Appearance loss (LappL_{\text{app}}): Summation of L1L_1 and VGG perceptual differences for T^\hat{T} and R^\hat{R}.
  • Total loss: Ltotal=Lload+Lmem+LappL_{\text{total}} = L_{\text{load}} + L_{\text{mem}} + L_{\text{app}}.

Optimization utilizes Adam (lr =104=10^{-4}, batch size =1=1, 60 epochs) with learning rate stepped down at epochs 30 and 50. During training, random 352×352352 \times 352 crops and horizontal flips are only performed on daytime data; depth priors are computed offline.

Evaluation employs PSNR (dB), SSIM, and LPIPS (AlexNet) metrics. On daytime public datasets, DMDNet obtains top results (PSNR =26.27=26.27, SSIM =0.889=0.889, LPIPS =0.093=0.093). For NightIRS, DMDNet achieves the best PSNR for both layers: $25.24$ for TT and $28.37$ for RR, underscoring its nighttime adaptivity. Ablation studies verify the contributions of DSMamba components, expert design, depth model quality, and loss weighting. Wilcoxon signed-rank tests confirm statistically significant improvements (p<0.01p<0.01).

Dataset/Layer PSNR↑ SSIM↑ LPIPS↓
NightIRS (T) 25.24 0.832 0.144
NightIRS (R) 28.37 0.633 0.286

6. Limitations, Failure Cases, and Future Directions

Residual reflections may persist in regions where both TT and RR are extremely dark, owing to under-modulation in DS-SSM. Depth estimation errors, for example, specular glare construed as near surfaces, may disrupt scan priorities in DAScan. MECM’s generalization across out-of-distribution nighttime scenes is bounded by the scope of the supervised training memory.

Proposed future avenues include:

  • Unsupervised/self-supervised learning to lessen dependency on synthetic triplets and accommodate unpaired real nighttime data.
  • Joint depth and separation network optimization to enhance end-to-end performance.
  • Integration of complementary cues, such as polarization or multi-frame temporal information, in a model-agnostic manner.
  • Development of quantized or lightweight DS-SSM for deployment on edge devices, especially for security camera applications.

NightIRS and DMDNet collectively advance the frontier in single-image reflection separation, with the former providing a high-fidelity benchmark for nighttime conditions and the latter achieving state-of-the-art performance by integrating depth-aware scanning, depth-modulated state-space models, and memory expert compensation (Fang et al., 1 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Nighttime Image Reflection Separation (NightIRS).