Papers
Topics
Authors
Recent
2000 character limit reached

RCDNet: Interpretable Deraining & Change Detection

Updated 19 December 2025
  • RCDNet is a neural network that unfolds proximal gradient descent for interpretable single image rain removal and guided change detection.
  • It leverages convolutional dictionary learning and adaptive priors to model rain streaks and semantic changes, ensuring robustness across domains.
  • Cross-modal fusion and explicit algorithmic steps deliver transparency and superior performance on both synthetic datasets and real-world applications.

RCDNet (Rain Convolutional Dictionary Network or Referring Change Detection Network) denotes distinct state-of-the-art neural architectures designed for interpretable single image rain removal in computer vision and for guided change detection in remote sensing. The term also appears as a scheduler in datacenter networking literature, but the dominant references pertain to its interpretable, model-driven deep network incarnations for deraining and language-guided change detection. RCDNet architectures share an emphasis on grounding neural operations in optimization theory or cross-modal fusion to achieve interpretability, generalization, and robust real-world performance (Wang et al., 2020, Wang et al., 2021, Korkmaz et al., 12 Dec 2025).

1. Model-Driven Architectures in Image Restoration

RCDNet for single image deraining is a deep neural network architecture that unfolds the proximal gradient descent algorithm to solve an explicit decomposition model: Y=X+RY = X + R where YY is the observed rainy RGB image, XX is the clean background, and RR is the rain layer. RCDNet models RR as a convolutional dictionary expansion: R=n=1NCnMnR = \sum_{n=1}^{N} C_n \otimes M_n with learnable rain kernels CnC_n and corresponding sparse coefficient maps MnM_n, using 2D channel-wise convolution \otimes (Wang et al., 2020, Wang et al., 2021). The goal is to recover XX and {Mn}\{M_n\} via a regularized least-squares formulation that incorporates learned priors on both rain maps and background.

This setup is optimized by alternating proximal-gradient steps for MM (rain maps) and XX (background), where the network replaces hand-crafted regularizers by trainable subnetworks (ResNets) that act as learned proximal operators. The full network unfolds these steps into a sequence of interpretable stages, each precisely corresponding to individual iterations of the optimization algorithm (Wang et al., 2021).

2. Interpretability and Dictionary Learning

Every module in RCDNet has an explicit counterpart in mathematically-defined algorithmic steps:

  • Rain kernel convolution: Encodes the generative prior for rain streaks, learned to capture prototypical local structures appearing in real rainy images.
  • Proximal networks: Implement learned, data-driven regularizers for sparsity and smoothness, replacing engineered penalties with fully trainable alternatives.
  • Alternating updates: Each stage computes a proximal-gradient update for either the rain map or background, preserving the semantic meaning of operations.

This white-box property—“each blockage implements exactly one algorithmic operation”—enables direct visualization and interpretation of intermediate results such as dictionaries, rain maps, and background estimates. The learned convolutional dictionary yields robustness to unseen rain types and domains (Wang et al., 2021).

A dynamic RCDNet extension, DRCDNet, further parameterizes kernels per image by mixing a global kernel dictionary DD with adaptively inferred weights α\alpha, providing per-input adaptability: Kn=DαnK_n = D\,\alpha_n This adaptation mechanism produces significant gains in cross-domain generalization and reduces the number of required kernels (Wang et al., 2021).

3. Quantitative and Qualitative Performance

RCDNet achieves state-of-the-art results on multiple established synthetic and real deraining benchmarks:

Dataset PSNR (dB) SSIM Notable Comparison
Rain100L 40.00 0.9860 JORDER_E: 38.59/0.9834
Rain100H 31.28 0.9093 Best among all
Rain1400 33.04 0.9472
Rain12 37.71 0.9649
SPA-Data 41.47 0.9834 JORDER_E: ~40.78/0.9811

Qualitative results confirm superior removal of complex rain and retention of background detail, with interpretability of intermediate variables supporting scientific analysis and debugging (Wang et al., 2020, Wang et al., 2021).

The dynamic variant (DRCDNet) generalizes better to real, unseen rain patterns and cross-domain benchmarks due to the adaptive kernel parameterization.

4. RCDNet for Referring Change Detection in Remote Sensing

A separate RCDNet instantiation targets referring change detection (RCD) in remote sensing imagery, leveraging cross-modal fusion to enable user-driven, language-guided change detection (Korkmaz et al., 12 Dec 2025). In this context, RCDNet serves as the second stage in a two-phase pipeline: synthetic data generation with RCDGen and language-guided change map prediction with RCDNet.

The network ingests a pair of temporal remote sensing images IpreI_{\text{pre}}, IpostI_{\text{post}} and a natural language prompt CC (e.g., "industrial zone") to segment out target-specific semantic changes. Its architecture comprises:

  • Siamese visual encoders (VMamba-based) for both images;
  • Fusion modules for multiscale feature cross-comparison;
  • Mask decoder blocks, integrating CLIP-based text embeddings via transformer cross-attention to inject prompt semantics at multiple spatial resolutions.

Through explicit cross-modal integration, RCDNet outputs a binary mask precisely corresponding to the change-of-interest defined by the user prompt, independent of rigid class labels (Korkmaz et al., 12 Dec 2025).

5. Training, Implementation, and Data Regimes

RCDNet for deraining is trained end-to-end via mean squared error loss on both stages and outputs, with all parameters—rain kernels, step sizes, and the weights of ResNet-based proximal operators—learned jointly. Training datasets include Rain100L, Rain100H, Rain1400, Rain12, and SPA-Data, with consistent hyperparameter settings (e.g., 17 stages, 32 kernels, Adam optimizer with decayed learning rate, batch size 16, 64×6464 \times 64 patches) (Wang et al., 2020, Wang et al., 2021).

For referring change detection, RCDNet is pre-trained on large synthetic corpora from RCDGen, then fine-tuned on real-world semantic/binary CD datasets (SECOND, CNAM-CD, LEVIR-CD, WHU-CD), with pixel-wise binary cross-entropy loss.

Key hyperparameters:

  • Encoder/decoder: VMamba-small;
  • Text encoder: CLIP ViT-B/32 (frozen or LoRA-finetuned);
  • Optimizer: AdamW (initial lr 6×1056 \times 10^{-5}, weight decay 0.01, polynomial decay, 200 epochs, batch size 4) (Korkmaz et al., 12 Dec 2025).

6. Empirical Results and Application Contexts

For deraining, RCDNet and DRCDNet establish new benchmarks on synthetic and real datasets, outperforming both handcrafted prior methods and end-to-end black-box deep models in PSNR/SSIM and visual quality. Performance advantage extends to downstream task improvements, such as object detection (COCO) and semantic segmentation (Cityscapes) on rain-degraded images, with DRCDNet improving mAP/mIoU over leading alternatives.

For referring change detection, RCDNet demonstrates state-of-the-art mIoU, SeK, OA, and Fₛ𝚌𝚍 across targeted semantic, cross-domain binary, and within-domain binary CD tasks, outperforming specialized prior architectures by 2–12 points in Fₛ𝚌𝚍/SeK. Integration of synthetic pretraining yields further improvements, especially in low-data or severe class-imbalance regimes (Korkmaz et al., 12 Dec 2025).

7. Interpretation, Generalization, and Significance

RCDNet embodiments share several core characteristics:

  • Algorithmic transparency: All neural operations correspond to explicit mathematical steps, facilitating interpretability and debugging.
  • Adaptive priors: Dictionaries and regularizers are learned end-to-end, allowing robust adaptation to new domains and rain/change types without manual parameter tuning.
  • Cross-modal fusion: In remote sensing, RCDNet’s explicit text-visual fusion enables highly flexible, prompt-driven operation, decoupling output structure from fixed semantic classes and improving cross-dataset utility.

This design philosophy positions RCDNet and its extensions as reference architectures for interpretable, model-driven deep networks in both low-level vision and cross-modal change detection, combining mathematical clarity with empirical state-of-the-art performance (Wang et al., 2020, Wang et al., 2021, Korkmaz et al., 12 Dec 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to RCDNet.