Papers
Topics
Authors
Recent
2000 character limit reached

DMDNet: Depth-Memory Decoupling for Reflection Separation

Updated 8 January 2026
  • The paper introduces DMDNet, a neural architecture that leverages depth cues and historical feature memories to decouple transmission and reflection layers with superior performance on challenging scenes.
  • DMDNet employs a multi-branch design—including encoding with MuGI, depth semantic modulation, and DMBlocks utilizing DAScan and DS-SSM—to ensure semantically coherent reconstructions.
  • The network integrates advanced loss functions and a Memory Expert Compensation Module, validated on NightIRS and daytime benchmarks, to minimize feature ambiguity and enhance image restoration.

The Depth-Memory Decoupling Network (DMDNet) is a neural architecture for single-image reflection separation, targeting the decomposition of an observed image IR3×H×WI \in \mathbb{R}^{3 \times H \times W} into transmission (TT) and reflection (RR) layers. DMDNet addresses challenges that arise in low-contrast scenes, particularly at night, where existing approaches often confuse TT and RR due to similar intensity distributions. DMDNet integrates depth cues and historical feature memories to promote semantically coherent reconstructions and mitigate ambiguity, offering state-of-the-art performance on both daytime and nighttime benchmarks (Fang et al., 1 Jan 2026).

1. Architecture and Components

DMDNet comprises three core branches:

  • Encoding Branch: Utilizes a two-stream interactive feature extractor (MuGI) to capture multi-scale features corresponding to TT and RR.
  • Depth Semantic Modulation Branch: Processes a precomputed proximity (depth) map using lightweight convolutions, extracting depth semantic features {DSi}\{D^i_S\}.
  • Decoding Branch: Employs stacked Depth-Memory Decoupling blocks (DMBlocks) for reconstruction; each DMBlock contains:
    • Depth-Synergized Decoupling Mamba (DSMamba)
    • Memory Expert Compensation Module (MECM)
    • Efficient Feed-Forward Network (EFFN)

The depth map guides structural coherence, while memory modules leverage historical information for feature compensation specific to transmission/reflection separation.

2. Depth-Aware Scanning (DAScan)

DAScan modulates the state-space scan order to prioritize structurally salient (semantically relevant) regions, reducing error propagation. The process uses two complementary scanning permutations derived from the proximity map PRH×WP \in \mathbb{R}^{H \times W}:

  • Region-based Scan for Transmission (DA-RScan): Partitions the image into connected regions via thresholding, sorting regions by area (descending). Pixels within each region are ordered by proximity (near to far). The output permutation πT\pi_T is sequenced by region and proximity.

πT=[r1(1),...,r1(n1),r2(1),...,r2(n2),...]\pi_T = [r_1^{(1)}, ..., r_1^{(n_1)}, r_2^{(1)}, ..., r_2^{(n_2)}, ...]

  • Global Scan for Reflection (DA-GScan): All pixels are globally sorted by proximity descending:

πR=argsort(P(i,j))  descending\pi_R = \operatorname{argsort}(P(i, j)) \;\text{descending}

Application of DAScan distinguishes the processing order of transmission and reflection, facilitating stronger semantic continuity for TT and mitigating ambiguous global patterns in RR.

3. Depth-Synergized State-Space Model (DS-SSM)

DS-SSM extends vanilla state-space models by incorporating pixelwise depth sensitivity into state transitions:

  • Vanilla SSM: For pixel xtx_t, the hidden and output states are updated as

ht=Aht1+Bxt,yt=Cht+Dxth_t = A h_{t-1} + B x_t, \quad y_t = C h_t + D x_t

  • Depth-Synergized Update: Introduces learnable depth-aware matrices BdepthB_{\rm depth}, CdepthC_{\rm depth} and a gating map γt\gamma_t based on pixel proximity:

γt=σ(α(P(π(t))τ))\gamma_t = \sigma(\alpha (P(\pi(t)) - \tau))

where σ\sigma is sigmoid, α,τ\alpha, \tau are learned parameters. The update matrices become:

Btaware=(1γt)B+γtBdepthB^{\rm aware}_t = (1 - \gamma_t) B + \gamma_t B_{\rm depth}

Ctaware=(1γt)C+γtCdepthC^{\rm aware}_t = (1 - \gamma_t) C + \gamma_t C_{\rm depth}

yielding the revised state-space updates.

  • Spatial Positional Encoding (SPE): Augments each state with multi-frequency sinusoidal positional codes for spatial contextualization.

DS-SSM amplifies long-range context in structurally robust regions and suppresses feature ambiguity in ill-posed areas.

4. Memory Expert Compensation Module (MECM)

MECM infuses cross-image historical feature knowledge to guide compensation for transmission and reflection layers. Its structure comprises:

  • Expert Gate: Selects KK experts out of NN, each specialized for layer-specific compensation.
  • Memory Experts: Each includes:
    • GPStream (Global-Pattern Interaction): Utilizes global average pooling, affinity scoring, and attentional fusion with a learnable memory bank. Memory evolution implements affinity-weighted writing to the bank.
    • SCStream (Spatial-Context Refinement): Reshapes memory into convolutional kernels; performs top-kk softmax aggregation over retrieved spatial affinities for local feature fusion.

Expert gate weights WT,WRW_T, W_R direct the compensation, providing refined semantic detail adapted for layer-specific restoration.

5. Loss Functions and Optimization Strategy

Training employs three integrated loss components:

  • Appearance Loss (Lapp\mathcal{L}_{app}): Combines pixelwise L1L_1 loss and VGG-based perceptual loss over TT and RR:

Lapp=λL1TT^T1+λL1RR^R1+λVGGTϕ(T^)ϕ(T)1\mathcal{L}_{app} = \lambda^T_{L1}\|\,\hat T - T\|_1 + \lambda^R_{L1}\|\,\hat R - R\|_1 + \lambda^T_{\rm VGG}\|\phi(\hat T) - \phi(T)\|_1

  • Memory Matching Loss (Lmem\mathcal{L}_{mem}): Employs triplet and alignment terms to enforce feature proximity to top memory elements and separation from secondary matches:

Lmem=X{T,R}(λtripletXLtripletX+λalignXLalignX)\mathcal{L}_{mem} = \sum_{X\in\{T,R\}}\left(\lambda^X_{\rm triplet}\,\mathcal{L}_{triplet}^X + \lambda^X_{\rm align}\,\mathcal{L}_{align}^X\right)

  • Load Balancing Loss (Lload\mathcal{L}_{load}): Regularizes gate weight distribution to prevent collapse onto single experts.

Combined total loss:

Ltotal=Lapp+Lmem+Lload\mathcal{L}_{total} = \mathcal{L}_{app} + \mathcal{L}_{mem} + \mathcal{L}_{load}

Optimization via Adam with scheduled LR decay; data augmentation includes random cropping and horizontal flipping. Training utilizes PASCAL VOC, Nature, and Real datasets with batch size of 1 on RTX 4090 hardware.

6. Benchmarking Datasets and Evaluation

The dedicated NightIRS dataset enables robust nighttime evaluation:

  • NightIRS: Comprises 1,000 triplets (I,T,RI, T, R), captured with Sony LYTIA-T808 across varied nighttime scenarios. Reflections induced using acrylic/glass sheets; depth annotations from MiDaS v3.1 Next-ViT-L.
  • Daytime Benchmarks: Nature (20), Real (20), Wild (55), Postcard (199), Solid (200) datasets.

Metrics used are PSNR (\uparrow), SSIM (\uparrow), and LPIPS (\downarrow). DMDNet achieves top performance for both layers:

Scenario Layer PSNR SSIM LPIPS
Daytime Transmission 26.27 0.889 0.093
NightIRS Transmission 25.24 0.832 0.144
Daytime Reflection 22.31 0.522 0.403
NightIRS Reflection 28.37 0.633 0.286

7. Ablation Analyses

Ablation studies quantify contributions of model components:

  • DSMamba (Table A): Depth-aware scan and DS-SSM jointly yield maximum PSNR/SSIM, whereas variants lacking either scan refinement or depth-synergy underperform.
  • MECM (Table B): Both GPStream and SCStream are crucial; removing either reduces metrics. Increasing the number of experts positively affects performance, subject to selection constraints.
  • Depth Model Quality (Table C): Higher-quality depth models (MiDaS v3.1 Next-ViT-L) correspond to improved public and NightIRS scores.

8. Significance and Implications

DMDNet's integration of depth-guided pixel ordering, depth-modulated state activation, and historical memory compensation addresses key challenges in reflection separation, notably under low-light and low-contrast conditions. The architecture and benchmarking paradigm enabled by NightIRS elevate evaluation standards for nighttime image reflection separation. This suggests future research will increasingly leverage domain-informed scanning and dynamic memory modules for ill-posed inverse imaging tasks (Fang et al., 1 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Depth-Memory Decoupling Network (DMDNet).