DMDNet: Depth-Memory Decoupling for Reflection Separation
- The paper introduces DMDNet, a neural architecture that leverages depth cues and historical feature memories to decouple transmission and reflection layers with superior performance on challenging scenes.
- DMDNet employs a multi-branch design—including encoding with MuGI, depth semantic modulation, and DMBlocks utilizing DAScan and DS-SSM—to ensure semantically coherent reconstructions.
- The network integrates advanced loss functions and a Memory Expert Compensation Module, validated on NightIRS and daytime benchmarks, to minimize feature ambiguity and enhance image restoration.
The Depth-Memory Decoupling Network (DMDNet) is a neural architecture for single-image reflection separation, targeting the decomposition of an observed image into transmission () and reflection () layers. DMDNet addresses challenges that arise in low-contrast scenes, particularly at night, where existing approaches often confuse and due to similar intensity distributions. DMDNet integrates depth cues and historical feature memories to promote semantically coherent reconstructions and mitigate ambiguity, offering state-of-the-art performance on both daytime and nighttime benchmarks (Fang et al., 1 Jan 2026).
1. Architecture and Components
DMDNet comprises three core branches:
- Encoding Branch: Utilizes a two-stream interactive feature extractor (MuGI) to capture multi-scale features corresponding to and .
- Depth Semantic Modulation Branch: Processes a precomputed proximity (depth) map using lightweight convolutions, extracting depth semantic features .
- Decoding Branch: Employs stacked Depth-Memory Decoupling blocks (DMBlocks) for reconstruction; each DMBlock contains:
- Depth-Synergized Decoupling Mamba (DSMamba)
- Memory Expert Compensation Module (MECM)
- Efficient Feed-Forward Network (EFFN)
The depth map guides structural coherence, while memory modules leverage historical information for feature compensation specific to transmission/reflection separation.
2. Depth-Aware Scanning (DAScan)
DAScan modulates the state-space scan order to prioritize structurally salient (semantically relevant) regions, reducing error propagation. The process uses two complementary scanning permutations derived from the proximity map :
- Region-based Scan for Transmission (DA-RScan): Partitions the image into connected regions via thresholding, sorting regions by area (descending). Pixels within each region are ordered by proximity (near to far). The output permutation is sequenced by region and proximity.
- Global Scan for Reflection (DA-GScan): All pixels are globally sorted by proximity descending:
Application of DAScan distinguishes the processing order of transmission and reflection, facilitating stronger semantic continuity for and mitigating ambiguous global patterns in .
3. Depth-Synergized State-Space Model (DS-SSM)
DS-SSM extends vanilla state-space models by incorporating pixelwise depth sensitivity into state transitions:
- Vanilla SSM: For pixel , the hidden and output states are updated as
- Depth-Synergized Update: Introduces learnable depth-aware matrices , and a gating map based on pixel proximity:
where is sigmoid, are learned parameters. The update matrices become:
yielding the revised state-space updates.
- Spatial Positional Encoding (SPE): Augments each state with multi-frequency sinusoidal positional codes for spatial contextualization.
DS-SSM amplifies long-range context in structurally robust regions and suppresses feature ambiguity in ill-posed areas.
4. Memory Expert Compensation Module (MECM)
MECM infuses cross-image historical feature knowledge to guide compensation for transmission and reflection layers. Its structure comprises:
- Expert Gate: Selects experts out of , each specialized for layer-specific compensation.
- Memory Experts: Each includes:
- GPStream (Global-Pattern Interaction): Utilizes global average pooling, affinity scoring, and attentional fusion with a learnable memory bank. Memory evolution implements affinity-weighted writing to the bank.
- SCStream (Spatial-Context Refinement): Reshapes memory into convolutional kernels; performs top- softmax aggregation over retrieved spatial affinities for local feature fusion.
Expert gate weights direct the compensation, providing refined semantic detail adapted for layer-specific restoration.
5. Loss Functions and Optimization Strategy
Training employs three integrated loss components:
- Appearance Loss (): Combines pixelwise loss and VGG-based perceptual loss over and :
- Memory Matching Loss (): Employs triplet and alignment terms to enforce feature proximity to top memory elements and separation from secondary matches:
- Load Balancing Loss (): Regularizes gate weight distribution to prevent collapse onto single experts.
Combined total loss:
Optimization via Adam with scheduled LR decay; data augmentation includes random cropping and horizontal flipping. Training utilizes PASCAL VOC, Nature, and Real datasets with batch size of 1 on RTX 4090 hardware.
6. Benchmarking Datasets and Evaluation
The dedicated NightIRS dataset enables robust nighttime evaluation:
- NightIRS: Comprises 1,000 triplets (), captured with Sony LYTIA-T808 across varied nighttime scenarios. Reflections induced using acrylic/glass sheets; depth annotations from MiDaS v3.1 Next-ViT-L.
- Daytime Benchmarks: Nature (20), Real (20), Wild (55), Postcard (199), Solid (200) datasets.
Metrics used are PSNR (), SSIM (), and LPIPS (). DMDNet achieves top performance for both layers:
| Scenario | Layer | PSNR | SSIM | LPIPS |
|---|---|---|---|---|
| Daytime | Transmission | 26.27 | 0.889 | 0.093 |
| NightIRS | Transmission | 25.24 | 0.832 | 0.144 |
| Daytime | Reflection | 22.31 | 0.522 | 0.403 |
| NightIRS | Reflection | 28.37 | 0.633 | 0.286 |
7. Ablation Analyses
Ablation studies quantify contributions of model components:
- DSMamba (Table A): Depth-aware scan and DS-SSM jointly yield maximum PSNR/SSIM, whereas variants lacking either scan refinement or depth-synergy underperform.
- MECM (Table B): Both GPStream and SCStream are crucial; removing either reduces metrics. Increasing the number of experts positively affects performance, subject to selection constraints.
- Depth Model Quality (Table C): Higher-quality depth models (MiDaS v3.1 Next-ViT-L) correspond to improved public and NightIRS scores.
8. Significance and Implications
DMDNet's integration of depth-guided pixel ordering, depth-modulated state activation, and historical memory compensation addresses key challenges in reflection separation, notably under low-light and low-contrast conditions. The architecture and benchmarking paradigm enabled by NightIRS elevate evaluation standards for nighttime image reflection separation. This suggests future research will increasingly leverage domain-informed scanning and dynamic memory modules for ill-posed inverse imaging tasks (Fang et al., 1 Jan 2026).