Papers
Topics
Authors
Recent
2000 character limit reached

Dual-Domain Super-Resolution Network (DDSRNet)

Updated 13 December 2025
  • The paper introduces DDSRNet, which combines Spatial-Net for direct pixel enhancement with DWT-based subband refinement for hyperspectral image super-resolution.
  • It utilizes dual pathways—one for low-frequency enhancement and one shared weighted high-frequency branch—to effectively balance smooth regions and detailed edge recovery.
  • Experimental results demonstrate DDSRNet achieves competitive accuracy on key datasets with a significantly lower parameter count than conventional CNN and diffusion models.

Dual-Domain Super-Resolution Network (DDSRNet) is a convolutional architecture developed for hyperspectral single-image super-resolution (HSISR), designed to integrate spatial and frequency domain feature learning. DDSRNet leverages Spatial-Net for direct pixel-domain enhancement and harnesses discrete wavelet transform (DWT) machinery to separately refine low- and high-frequency subbands, yielding competitive accuracy with exceptional parameter efficiency on prominent hyperspectral datasets (Karayaka et al., 10 Dec 2025).

1. Dual-Domain Network Architecture and Motivation

DDSRNet’s central design premise is that spatial-domain CNNs, while adept at recovering smooth regions, often struggle to preserve edges and textures in severely downsampled hyperspectral images (HSIs). Frequency-domain separation via one-level Haar DWT provides explicit access to low-frequency (LF) components (coarse structures) and high-frequency (HF) subbands (edges, fine textures), facilitating targeted refinement. The dual-domain pathway consists of a lightweight Spatial-Net for initial upsampling and structure recovery, followed by wavelet-domain modules: a low-frequency enhancement branch and a shared-weight high-frequency refinement branch. This division achieves high SR accuracy with minimal computational cost.

Block-diagram (textual):

1
2
3
4
5
6
Input LR patch X
  → Spatial-Net → coarse upsampled feature Ŷ_spatial
    → one‐level Haar DWT → subbands {Y_LL, Y_LH, Y_HL, Y_HH}
      → LF branch refines Y_LL → Y_LL*
      → Shared HF branch (same CNN block applied to each of Y_LH, Y_HL, Y_HH) → {Y_LH*, Y_HL*, Y_HH*}
        → Inverse DWT(IDWT)({Y_LL*,Y_LH*,Y_HL*,Y_HH*}) → final Ŷ_SR

This scheme is especially suitable for HSI data, where the tens to hundreds of spectral bands compound the computational demands and overfitting risk of large models. A plausible implication is that DDSRNet could extend to other spectral-domain imaging modalities due to its modular decomposition.

2. Spatial-Net: Shallow Feature Extraction and Upsampling

Spatial-Net forms the initial stage, extracting shallow features from the low-resolution (LR) input and producing a coarse upsampled estimate prior to wavelet decomposition. The architecture comprises:

  • XRB×C×H×WX \in \mathbb{R}^{B \times C \times H \times W}: Input LR patch (B=batch, C=spectral bands, H×W=spatial).
  • Conv1_1 (3×3, ChC_h channels) \rightarrow ReLU \rightarrow Conv2_2 (3×3, ChC_h channels).
  • Bilinear upsampling by scale ss: U(F)RB×Ch×sH×sWU(F) \in \mathbb{R}^{B \times C_h \times sH \times sW}.
  • Conv3_3 (3×3, CC channels): projects features back to spectral dimension.
  • Skip-connection: Adds bilinear upsampled input U(X)U(X) to output.

Formal equations: F=Conv2(σ(Conv1(X)))F = \text{Conv}_2(\sigma(\text{Conv}_1(X)))

Ymain=Conv3(U(F))Y_{\text{main}} = \text{Conv}_3(U(F))

Y^spatial=Ymain+U(X)\widehat{Y}_{\text{spatial}} = Y_{\text{main}} + U(X)

Residual learning focuses model capacity on nontrivial corrections to bilinear interpolation. The hidden channel width ChC_h (e.g., 64) is a typical hyperparameter. All convolutions use 3×3 kernels with stride 1 and padding 1.

3. Wavelet-Domain Subband Processing: DWT and Inverse DWT (IDWT)

DDSRNet employs a single-level 2D Haar wavelet transform for decomposition of the coarse upsampled estimate Y^spatial\widehat{Y}_{\text{spatial}}:

  • 1D filters: hL=[1/2,1/2]h_L = [1/\sqrt{2}, 1/\sqrt{2}] (low-pass); hH=[1/2,1/2]h_H = [1/\sqrt{2}, -1/\sqrt{2}] (high-pass).
  • Four subbands for each channel:

YLL[b,c,i,j]=m=01n=01hL[m]hL[n]X[b,c,2i+m,2j+n]Y_{LL}[b, c, i, j] = \sum_{m=0}^{1}\sum_{n=0}^{1} h_L[m]h_L[n] X_{\uparrow}[b, c, 2i+m, 2j+n]

YLH,YHL,YHH: analogous definitions with hHY_{LH}, Y_{HL}, Y_{HH}: \text{ analogous definitions with } h_H

  • Each subband is spatially reduced by half: (sH/2)×(sW/2)(sH/2) \times (sW/2).

Inverse DWT reconstructs the SR image from the refined subbands: XSR[b,c,2i+m,2j+n]=p{L,H}q{L,H}hp[m]hq[n]Ypq[b,c,i,j]X_{SR}[b, c, 2i+m, 2j+n] = \sum_{p\in\{L,H\}}\sum_{q\in\{L,H\}} h_p[m]h_q[n]Y_{pq}^*[b,c,i,j]

This formalism enforces strict separation and reconstruction, allowing distinct learnable modules to operate on frequency-specific content.

4. Frequency-Domain Enhancement Branches

Low-Frequency Enhancement Branch (LFB)

  • Input: YLLRB×C×(sH/2)×(sW/2)Y_{LL} \in \mathbb{R}^{B \times C \times (sH/2) \times (sW/2)}.
  • Structure: Stack of RR residual CNN blocks (usually R=2R=2–3), each applying 3×33\times3 convolution \rightarrow ReLU \rightarrow 3×33\times3 convolution with residual skip.
  • Final 1×11\times1 or 3×33\times3 convolution projects back to CC channels.
  • Loss: Huber loss on the low-pass subband vs ground-truth extracted via DWT:

Llow=Huber(YLL,YL)\mathcal{L}_{low} = \text{Huber}(Y_{LL}^*, Y_L)

A plausible implication is that LFB enables DDSRNet to precisely reconstruct gradual intensity transitions and broad spatial features, counteracting the over-smoothing tendency of pixel-domain CNNs.

Shared High-Frequency Refinement Branch (HFB)

  • Input: Three stacked HF subbands YH=[YLH,YHL,YHH]RB×C×3×(sH/2)×(sW/2)Y_H = [Y_{LH}, Y_{HL}, Y_{HH}] \in \mathbb{R}^{B \times C \times 3 \times (sH/2) \times (sW/2)}.
  • Shared-weight CNN block WW (conv(3×3) → ReLU → conv(3×3) + residual) applied to each HF subband.
  • Output: Yk=W(Yk)+YkY_k^* = W(Y_k) + Y_k, k{LH,HL,HH}k \in \{\text{LH}, \text{HL}, \text{HH}\}
  • Loss: Huber loss on stack of refined HF subbands:

Lhigh=Huber(YH,YH)\mathcal{L}_{high} = \text{Huber}(Y_H^*, Y_H)

Empirically, sharing weights across orientation ensures parameter reduction and consistent edge restoration, reflecting statistical similarity of directional edge structures.

5. Training Methodologies and Loss Formulation

Training employs a hybrid loss (Huber-based) with four terms:

  • Main image-domain SR: Lrec=Huber(YSR,YHR)\mathcal{L}_{rec} = \text{Huber}(Y_{SR}, Y_{HR})
  • Spatial-Net output: Lspatial=Huber(Y^spatial,YHR)\mathcal{L}_{spatial} = \text{Huber}(\widehat{Y}_{spatial}, Y_{HR})
  • Low-frequency subband: Llow\mathcal{L}_{low}
  • High-frequency subbands: Lhigh\mathcal{L}_{high}

Combined as: Ltotal=λrecLrec+λspatialLspatial+λlowLlow+λhighLhigh\mathcal{L}_{total} = \lambda_{rec}\mathcal{L}_{rec} + \lambda_{spatial}\mathcal{L}_{spatial} + \lambda_{low}\mathcal{L}_{low} + \lambda_{high}\mathcal{L}_{high} Fixed λ\lambda parameters (≈0.35) were assigned to all terms.

Optimization uses Adam with learning rate 1×1041\times10^{-4} and batch size 4, for up to 6000 epochs with early stopping (patience 200). Training patches are sized 144×144 (PaviaC/PaviaU) or 64×64/128×128 (Chikusei), with spectral-band grouping in sets of 35.

6. Quantitative Evaluation and Benchmarking

DDSRNet achieves competitive or superior SR results compared to recent CNN and diffusion models, with a substantially reduced parameter count (0.07 M).

Model Params (M) PaviaC 2× PaviaU 2× Chikusei 2× Chikusei 4×
DDSRNet 0.07 36.39dB, 0.960, 3.288° 36.43dB, 0.955, 2.921° 38.406dB, 0.963, 1.044° 32.528dB, 0.859, 2.146°
CSSFENet 1.61 35.52dB, 0.954, 3.542° 35.92dB, 0.962, 3.038°
DIFF 38.748dB, 0.966, 1.638° 32.248dB, 0.860, 3.507°

Metrics: MPSNR (mean peak SNR, ↑), MSSIM (mean SSIM, ↑), SAM (spectral angle, ↓). Scales: 2×, 4×, 8× reported.

Across all tested scales, DDSRNet matches or exceeds established CNN-based approaches (CSSFENet, MCNet, PDENet, ERCSR), and is competitive with diffusion-based methods, especially for Chikusei dataset, while requiring an order of magnitude fewer parameters. This suggests DDSRNet may be particularly advantageous in resource-constrained or on-board processing settings.

7. Context, Extensions, and Implications

DDSRNet’s design paradigm—combining spatial learning with frequency-domain enhancement via DWT—addresses key bottlenecks in hyperspectral SR, namely computational complexity and edge preservation. Although tested exclusively on Pavia Center, Pavia University, and Chikusei with up to 140-band grouping, the modular architecture admits straightforward extension to higher-order wavelets or more advanced frequency decomposition schemes. A plausible implication is that weight sharing in HF branches, coupled with minimal parameter count, can generalize to other multidimensional imaging SR tasks where spectral structure is critical.

No known controversies or disadvantages are reported in the referenced study (Karayaka et al., 10 Dec 2025). All empirical claims, workflow steps, and metrics are documented as above.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Dual-Domain Super-Resolution Network (DDSRNet).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube