RCDNet: Interpretable Deraining & Change Detection

Updated 19 December 2025

RCDNet is a neural network that unfolds proximal gradient descent for interpretable single image rain removal and guided change detection.
It leverages convolutional dictionary learning and adaptive priors to model rain streaks and semantic changes, ensuring robustness across domains.
Cross-modal fusion and explicit algorithmic steps deliver transparency and superior performance on both synthetic datasets and real-world applications.

RCDNet (Rain Convolutional Dictionary Network or Referring Change Detection Network) denotes distinct state-of-the-art neural architectures designed for interpretable single image rain removal in computer vision and for guided change detection in remote sensing. The term also appears as a scheduler in datacenter networking literature, but the dominant references pertain to its interpretable, model-driven deep network incarnations for deraining and language-guided change detection. RCDNet architectures share an emphasis on grounding neural operations in optimization theory or cross-modal fusion to achieve interpretability, generalization, and robust real-world performance (Wang et al., 2020, Wang et al., 2021, Korkmaz et al., 12 Dec 2025).

1. Model-Driven Architectures in Image Restoration

RCDNet for single image deraining is a deep neural network architecture that unfolds the proximal gradient descent algorithm to solve an explicit decomposition model: $Y = X + R$ where $Y$ is the observed rainy RGB image, $X$ is the clean background, and $R$ is the rain layer. RCDNet models $R$ as a convolutional dictionary expansion: $R = \sum_{n=1}^{N} C_n \otimes M_n$ with learnable rain kernels $C_n$ and corresponding sparse coefficient maps $M_n$ , using 2D channel-wise convolution $\otimes$ (Wang et al., 2020, Wang et al., 2021). The goal is to recover $X$ and $\{M_n\}$ via a regularized least-squares formulation that incorporates learned priors on both rain maps and background.

This setup is optimized by alternating proximal-gradient steps for $M$ (rain maps) and $X$ (background), where the network replaces hand-crafted regularizers by trainable subnetworks (ResNets) that act as learned proximal operators. The full network unfolds these steps into a sequence of interpretable stages, each precisely corresponding to individual iterations of the optimization algorithm (Wang et al., 2021).

2. Interpretability and Dictionary Learning

Every module in RCDNet has an explicit counterpart in mathematically-defined algorithmic steps:

Rain kernel convolution: Encodes the generative prior for rain streaks, learned to capture prototypical local structures appearing in real rainy images.
Proximal networks: Implement learned, data-driven regularizers for sparsity and smoothness, replacing engineered penalties with fully trainable alternatives.
Alternating updates: Each stage computes a proximal-gradient update for either the rain map or background, preserving the semantic meaning of operations.

This white-box property—“each blockage implements exactly one algorithmic operation”—enables direct visualization and interpretation of intermediate results such as dictionaries, rain maps, and background estimates. The learned convolutional dictionary yields robustness to unseen rain types and domains (Wang et al., 2021).

A dynamic RCDNet extension, DRCDNet, further parameterizes kernels per image by mixing a global kernel dictionary $D$ with adaptively inferred weights $\alpha$ , providing per-input adaptability: $K_n = D\,\alpha_n$ This adaptation mechanism produces significant gains in cross-domain generalization and reduces the number of required kernels (Wang et al., 2021).

3. Quantitative and Qualitative Performance

RCDNet achieves state-of-the-art results on multiple established synthetic and real deraining benchmarks:

Dataset	PSNR (dB)	SSIM	Notable Comparison
Rain100L	40.00	0.9860	JORDER_E: 38.59/0.9834
Rain100H	31.28	0.9093	Best among all
Rain1400	33.04	0.9472
Rain12	37.71	0.9649
SPA-Data	41.47	0.9834	JORDER_E: ~40.78/0.9811

Qualitative results confirm superior removal of complex rain and retention of background detail, with interpretability of intermediate variables supporting scientific analysis and debugging (Wang et al., 2020, Wang et al., 2021).

The dynamic variant (DRCDNet) generalizes better to real, unseen rain patterns and cross-domain benchmarks due to the adaptive kernel parameterization.

4. RCDNet for Referring Change Detection in Remote Sensing

A separate RCDNet instantiation targets referring change detection (RCD) in remote sensing imagery, leveraging cross-modal fusion to enable user-driven, language-guided change detection (Korkmaz et al., 12 Dec 2025). In this context, RCDNet serves as the second stage in a two-phase pipeline: synthetic data generation with RCDGen and language-guided change map prediction with RCDNet.

The network ingests a pair of temporal remote sensing images $I_{\text{pre}}$ , $I_{\text{post}}$ and a natural language prompt $C$ (e.g., "industrial zone") to segment out target-specific semantic changes. Its architecture comprises:

Siamese visual encoders (VMamba-based) for both images;
Fusion modules for multiscale feature cross-comparison;
Mask decoder blocks, integrating CLIP-based text embeddings via transformer cross-attention to inject prompt semantics at multiple spatial resolutions.

Through explicit cross-modal integration, RCDNet outputs a binary mask precisely corresponding to the change-of-interest defined by the user prompt, independent of rigid class labels (Korkmaz et al., 12 Dec 2025).

5. Training, Implementation, and Data Regimes

RCDNet for deraining is trained end-to-end via mean squared error loss on both stages and outputs, with all parameters—rain kernels, step sizes, and the weights of ResNet-based proximal operators—learned jointly. Training datasets include Rain100L, Rain100H, Rain1400, Rain12, and SPA-Data, with consistent hyperparameter settings (e.g., 17 stages, 32 kernels, Adam optimizer with decayed learning rate, batch size 16, $64 \times 64$ patches) (Wang et al., 2020, Wang et al., 2021).

For referring change detection, RCDNet is pre-trained on large synthetic corpora from RCDGen, then fine-tuned on real-world semantic/binary CD datasets (SECOND, CNAM-CD, LEVIR-CD, WHU-CD), with pixel-wise binary cross-entropy loss.

Key hyperparameters:

Encoder/decoder: VMamba-small;
Text encoder: CLIP ViT-B/32 (frozen or LoRA-finetuned);
Optimizer: AdamW (initial lr $6 \times 10^{-5}$ , weight decay 0.01, polynomial decay, 200 epochs, batch size 4) (Korkmaz et al., 12 Dec 2025).

6. Empirical Results and Application Contexts

For deraining, RCDNet and DRCDNet establish new benchmarks on synthetic and real datasets, outperforming both handcrafted prior methods and end-to-end black-box deep models in PSNR/SSIM and visual quality. Performance advantage extends to downstream task improvements, such as object detection (COCO) and semantic segmentation (Cityscapes) on rain-degraded images, with DRCDNet improving mAP/mIoU over leading alternatives.

For referring change detection, RCDNet demonstrates state-of-the-art mIoU, SeK, OA, and Fₛ𝚌𝚍 across targeted semantic, cross-domain binary, and within-domain binary CD tasks, outperforming specialized prior architectures by 2–12 points in Fₛ𝚌𝚍/SeK. Integration of synthetic pretraining yields further improvements, especially in low-data or severe class-imbalance regimes (Korkmaz et al., 12 Dec 2025).

7. Interpretation, Generalization, and Significance

RCDNet embodiments share several core characteristics:

Algorithmic transparency: All neural operations correspond to explicit mathematical steps, facilitating interpretability and debugging.
Adaptive priors: Dictionaries and regularizers are learned end-to-end, allowing robust adaptation to new domains and rain/change types without manual parameter tuning.
Cross-modal fusion: In remote sensing, RCDNet’s explicit text-visual fusion enables highly flexible, prompt-driven operation, decoupling output structure from fixed semantic classes and improving cross-dataset utility.

This design philosophy positions RCDNet and its extensions as reference architectures for interpretable, model-driven deep networks in both low-level vision and cross-modal change detection, combining mathematical clarity with empirical state-of-the-art performance (Wang et al., 2020, Wang et al., 2021, Korkmaz et al., 12 Dec 2025).