DMDNet: Depth-Memory Decoupling for Reflection Separation

Updated 8 January 2026

The paper introduces DMDNet, a neural architecture that leverages depth cues and historical feature memories to decouple transmission and reflection layers with superior performance on challenging scenes.
DMDNet employs a multi-branch design—including encoding with MuGI, depth semantic modulation, and DMBlocks utilizing DAScan and DS-SSM—to ensure semantically coherent reconstructions.
The network integrates advanced loss functions and a Memory Expert Compensation Module, validated on NightIRS and daytime benchmarks, to minimize feature ambiguity and enhance image restoration.

The Depth-Memory Decoupling Network (DMDNet) is a neural architecture for single-image reflection separation, targeting the decomposition of an observed image $I \in \mathbb{R}^{3 \times H \times W}$ into transmission ( $T$ ) and reflection ( $R$ ) layers. DMDNet addresses challenges that arise in low-contrast scenes, particularly at night, where existing approaches often confuse $T$ and $R$ due to similar intensity distributions. DMDNet integrates depth cues and historical feature memories to promote semantically coherent reconstructions and mitigate ambiguity, offering state-of-the-art performance on both daytime and nighttime benchmarks (Fang et al., 1 Jan 2026).

1. Architecture and Components

DMDNet comprises three core branches:

Encoding Branch: Utilizes a two-stream interactive feature extractor (MuGI) to capture multi-scale features corresponding to $T$ and $R$ .
Depth Semantic Modulation Branch: Processes a precomputed proximity (depth) map using lightweight convolutions, extracting depth semantic features $\{D^i_S\}$ .
Decoding Branch: Employs stacked Depth-Memory Decoupling blocks (DMBlocks) for reconstruction; each DMBlock contains:
- Depth-Synergized Decoupling Mamba (DSMamba)
- Memory Expert Compensation Module (MECM)
- Efficient Feed-Forward Network (EFFN)

The depth map guides structural coherence, while memory modules leverage historical information for feature compensation specific to transmission/reflection separation.

2. Depth-Aware Scanning (DAScan)

DAScan modulates the state-space scan order to prioritize structurally salient (semantically relevant) regions, reducing error propagation. The process uses two complementary scanning permutations derived from the proximity map $P \in \mathbb{R}^{H \times W}$ :

Region-based Scan for Transmission (DA-RScan): Partitions the image into connected regions via thresholding, sorting regions by area (descending). Pixels within each region are ordered by proximity (near to far). The output permutation $\pi_T$ is sequenced by region and proximity.

$\pi_T = [r_1^{(1)}, ..., r_1^{(n_1)}, r_2^{(1)}, ..., r_2^{(n_2)}, ...]$

Global Scan for Reflection (DA-GScan): All pixels are globally sorted by proximity descending:

$\pi_R = \operatorname{argsort}(P(i, j)) \;\text{descending}$

Application of DAScan distinguishes the processing order of transmission and reflection, facilitating stronger semantic continuity for $T$ and mitigating ambiguous global patterns in $R$ .

3. Depth-Synergized State-Space Model (DS-SSM)

DS-SSM extends vanilla state-space models by incorporating pixelwise depth sensitivity into state transitions:

Vanilla SSM: For pixel $x_t$ , the hidden and output states are updated as

$h_t = A h_{t-1} + B x_t, \quad y_t = C h_t + D x_t$

Depth-Synergized Update: Introduces learnable depth-aware matrices $B_{\rm depth}$ , $C_{\rm depth}$ and a gating map $\gamma_t$ based on pixel proximity:

$\gamma_t = \sigma(\alpha (P(\pi(t)) - \tau))$

where $\sigma$ is sigmoid, $\alpha, \tau$ are learned parameters. The update matrices become:

$B^{\rm aware}_t = (1 - \gamma_t) B + \gamma_t B_{\rm depth}$

$C^{\rm aware}_t = (1 - \gamma_t) C + \gamma_t C_{\rm depth}$

yielding the revised state-space updates.

Spatial Positional Encoding (SPE): Augments each state with multi-frequency sinusoidal positional codes for spatial contextualization.

DS-SSM amplifies long-range context in structurally robust regions and suppresses feature ambiguity in ill-posed areas.

4. Memory Expert Compensation Module (MECM)

MECM infuses cross-image historical feature knowledge to guide compensation for transmission and reflection layers. Its structure comprises:

Expert Gate: Selects $K$ experts out of $N$ , each specialized for layer-specific compensation.
Memory Experts: Each includes:
- GPStream (Global-Pattern Interaction): Utilizes global average pooling, affinity scoring, and attentional fusion with a learnable memory bank. Memory evolution implements affinity-weighted writing to the bank.
- SCStream (Spatial-Context Refinement): Reshapes memory into convolutional kernels; performs top- $k$ softmax aggregation over retrieved spatial affinities for local feature fusion.

Expert gate weights $W_T, W_R$ direct the compensation, providing refined semantic detail adapted for layer-specific restoration.

5. Loss Functions and Optimization Strategy

Training employs three integrated loss components:

Appearance Loss ( $\mathcal{L}_{app}$ ): Combines pixelwise $L_1$ loss and VGG-based perceptual loss over $T$ and $R$ :

$\mathcal{L}_{app} = \lambda^T_{L1}\|\,\hat T - T\|_1 + \lambda^R_{L1}\|\,\hat R - R\|_1 + \lambda^T_{\rm VGG}\|\phi(\hat T) - \phi(T)\|_1$

Memory Matching Loss ( $\mathcal{L}_{mem}$ ): Employs triplet and alignment terms to enforce feature proximity to top memory elements and separation from secondary matches:

$\mathcal{L}_{mem} = \sum_{X\in\{T,R\}}\left(\lambda^X_{\rm triplet}\,\mathcal{L}_{triplet}^X + \lambda^X_{\rm align}\,\mathcal{L}_{align}^X\right)$

Load Balancing Loss ( $\mathcal{L}_{load}$ ): Regularizes gate weight distribution to prevent collapse onto single experts.

Combined total loss:

$\mathcal{L}_{total} = \mathcal{L}_{app} + \mathcal{L}_{mem} + \mathcal{L}_{load}$

Optimization via Adam with scheduled LR decay; data augmentation includes random cropping and horizontal flipping. Training utilizes PASCAL VOC, Nature, and Real datasets with batch size of 1 on RTX 4090 hardware.

6. Benchmarking Datasets and Evaluation

The dedicated NightIRS dataset enables robust nighttime evaluation:

NightIRS: Comprises 1,000 triplets ( $I, T, R$ ), captured with Sony LYTIA-T808 across varied nighttime scenarios. Reflections induced using acrylic/glass sheets; depth annotations from MiDaS v3.1 Next-ViT-L.
Daytime Benchmarks: Nature (20), Real (20), Wild (55), Postcard (199), Solid (200) datasets.

Metrics used are PSNR ( $\uparrow$ ), SSIM ( $\uparrow$ ), and LPIPS ( $\downarrow$ ). DMDNet achieves top performance for both layers:

Scenario	Layer	PSNR	SSIM	LPIPS
Daytime	Transmission	26.27	0.889	0.093
NightIRS	Transmission	25.24	0.832	0.144
Daytime	Reflection	22.31	0.522	0.403
NightIRS	Reflection	28.37	0.633	0.286

7. Ablation Analyses

Ablation studies quantify contributions of model components:

DSMamba (Table A): Depth-aware scan and DS-SSM jointly yield maximum PSNR/SSIM, whereas variants lacking either scan refinement or depth-synergy underperform.
MECM (Table B): Both GPStream and SCStream are crucial; removing either reduces metrics. Increasing the number of experts positively affects performance, subject to selection constraints.
Depth Model Quality (Table C): Higher-quality depth models (MiDaS v3.1 Next-ViT-L) correspond to improved public and NightIRS scores.

8. Significance and Implications

DMDNet's integration of depth-guided pixel ordering, depth-modulated state activation, and historical memory compensation addresses key challenges in reflection separation, notably under low-light and low-contrast conditions. The architecture and benchmarking paradigm enabled by NightIRS elevate evaluation standards for nighttime image reflection separation. This suggests future research will increasingly leverage domain-informed scanning and dynamic memory modules for ill-posed inverse imaging tasks (Fang et al., 1 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Depth-Synergized Mamba Meets Memory Experts for All-Day Image Reflection Separation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Depth-Memory Decoupling Network (DMDNet).

DMDNet: Depth-Memory Decoupling for Reflection Separation

1. Architecture and Components

2. Depth-Aware Scanning (DAScan)

3. Depth-Synergized State-Space Model (DS-SSM)

4. Memory Expert Compensation Module (MECM)

5. Loss Functions and Optimization Strategy

6. Benchmarking Datasets and Evaluation

7. Ablation Analyses

8. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

DMDNet: Depth-Memory Decoupling for Reflection Separation

1. Architecture and Components

2. Depth-Aware Scanning (DAScan)

3. Depth-Synergized State-Space Model (DS-SSM)

4. Memory Expert Compensation Module (MECM)

5. Loss Functions and Optimization Strategy

6. Benchmarking Datasets and Evaluation

7. Ablation Analyses

8. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research