SSD-Net: Single-Scale Decomposition for Restoration

Updated 13 August 2025

SSD-Net is a neural architecture that employs an asymmetric decomposition mechanism to decouple intrinsic scene features from degradation cues.
It integrates CNN and Transformer modules via parallel decomposition and bidirectional communication, enabling efficient single-scale feature extraction.
Experimental results on underwater datasets show SSD-Net achieves state-of-the-art restoration quality with reduced parameter complexity compared to multi-scale methods.

A Single-Scale Decomposition Network (SSD-Net) is a neural architecture designed to maximize the utility of single-scale feature extraction for complex image restoration tasks, notably underwater image enhancement. Challenging the prevailing reliance on multi-scale feature fusion, SSD-Net implements an asymmetric decomposition mechanism that decouples a single-scale feature space into a clean layer containing intrinsic scene information and a degradation layer holding medium-induced interference. The architecture leverages both convolutional neural networks (CNNs) and Transformers—specifically through parallel decomposition and bidirectional feature communication modules—to achieve state-of-the-art restoration quality with substantial gains in efficiency and parameter reduction.

1. Motivation and Conceptual Foundations

Traditional image enhancement methods, particularly in underwater domains, employ multi-scale feature extraction (MSFE), integrating multi-resolution data via downsampling, upsampling, and parallel branch fusion. While these approaches improve contextual modeling, they introduce significant redundancy and increase model complexity—impacting inference speed and resource efficiency. SSD-Net is motivated by empirical evidence that high reconstruction quality can be achieved solely through single-scale feature extraction, dispelling the notion that multi-scale fusion is necessary for superior performance (Cheng et al., 6 Aug 2025). This design paradigm is notably effective in scenarios where image degradations—such as color distortion, blur, and low contrast—predominate, and computational cost is a primary concern.

2. Architecture and Asymmetrical Decomposition Mechanism

SSD-Net’s architecture can be divided into the following modules:

Convolutional Embedding Unit: The input image $X$ is mapped via a convolution operation $\theta(\cdot)$ into feature representations $F^0_d, F^0_c$ , where $F^0_d$ and $F^0_c$ denote initial degradation and clean features respectively.
Parallel Feature Decomposition Block (PFDB): This block consists of two parallel branches:
- Transformer Branch: Employs an adaptive sparse self-attention (AST) mechanism, combining softmax and ReLU-based attention maps. The Transformer captures long-range degradation cues and filters redundant information using learnable weights.
- CNN Branch: A lightweight CNN with channel-attention layers for refined local clean feature extraction.
Bidirectional Feature Communication Block (BFCB): Features separated by PFDB are refined through residual and bidirectional interactions. Successive $1 \times 1$ convolutions, ReLU, and Sigmoid activations produce fusion weights; residual connections enable subtraction of redundant signals and addition of complementary information between branches.

The sequential pipeline updates features according to:

$F^n_d, F^n_c = \mathrm{BFCB}_n(\mathrm{PFDB}_n(F^{n-1}_d, F^{n-1}_c))$

A reconstruction module combines processed clean and degradation features to generate the final enhanced image via element-wise summation.

3. Parallel Feature Decomposition Block (PFDB)

PFDB is critical for SSD-Net’s ability to disentangle clean and degraded information from a single-scale feature map. The Transformer-driven branch leverages adaptive sparse attention operations:

Softmax Branch: Computes standard similarity scores and applies softmax normalization.
ReLU Branch: Introduces sparsity via ReLU activation, selectively suppressing noisy activations.
Fusion: Outputs from both branches are fused via learnable weights, ensuring the propagation of salient degradation cues.

The CNN-driven branch extracts localized scene details, with channel-attention mechanisms focusing on structurally relevant features. PFDB achieves effective separation without recourse to spatial downsampling or multi-scale fusion.

4. Bidirectional Feature Communication Block (BFCB)

BFCB refines decoupled features through cross-branch communication:

Features from both branches are convolved and nonlinearly transformed.
Fusion weights generated via Sigmoid operations modulate residual updates.
Redundant information is subtracted, and complementary cues are fused, improving both the clarity of restored image features and the accuracy of degradation removal.

This bidirectional mechanism preserves the independence of clean and degraded feature streams while supporting dynamic information exchange.

5. Empirical Performance and Comparative Evaluation

SSD-Net has been empirically validated on paired underwater datasets such as UIEB and EUVP, as well as no-reference benchmarks (UIEB60, U45, UCCS):

Dataset	SSIM (SSD-Net)	PSNR (SSD-Net)	Parameters (SSD-Net)
UIEB	0.924	~25 dB	Significantly fewer
EUVP	Improved over multi-scale alternatives	Improved	Fewer

Performance metrics confirm that SSD-Net matches or surpasses multi-scale approaches in image restoration quality while utilizing fewer parameters and avoiding excessive computation (Cheng et al., 6 Aug 2025). Ablation studies demonstrate that PFDB and BFCB modules enable the compact single-scale backbone to rival or outperform multi-scale designs.

6. Technical Innovation: Integration of CNN and Transformer

SSD-Net’s hybrid composition underscores the strengths of both network classes:

CNNs: Provide local spatial feature extraction and channel-attention refinement for detailed scene information.
Transformers: Offer global receptive field and degradation modeling capacity, especially through AST mechanisms.
Adaptive Sparse Attention: Dual-branch attention (softmax and ReLU) enables flexible prioritization of meaningful signals with reduced computational cost.

This synergy allows SSD-Net to evade the need for multi-scale sampling, establishing single-scale feature decomposition as a viable and efficient alternative.

7. Applications and Future Prospects

While designed for underwater enhancement, SSD-Net’s architecture is generalizable to other image restoration domains:

Dehazing of aerial/architectural images
Low-light enhancement in consumer photography
Medical image denoising
Preprocessing for segmentation or detection in challenging environments

Possible future directions include extending SSD-Net to video enhancement (ensuring temporal consistency), adopting self-supervised or unsupervised schemes for data-scarce scenarios, evolving Transformer variants and attention mechanisms to further generalize restoration capabilities, and optimizing SSD-Net for deployment on edge devices or real-time robotic platforms.

8. Contextualization and Relation to Previous Work

The core innovation of single-scale decomposition finds conceptual antecedents in works such as Pooling Pyramid Network (PPN), where a pyramid is constructed via max pooling over a shared embedding space, and all-scale predictors are unified for efficiency and calibration (Jin et al., 2018). In contrast, SSD-Net establishes a direct asymmetric decomposition and eschews explicit multi-scale fusion. The methodology stands in contrast to multi-scale feature fusion modules—e.g., the Fluff block (Shi et al., 2020)—which combine multi-level and multi-branch architectures for object detection, as well as scaling-translation-equivariant networks with decomposed convolutions that encode scale explicitly (Zhu et al., 2019). SSD-Net instead leverages the representational richness extractable from a single resolution when aided by advanced decomposition and cross-layer interaction techniques.

9. Summary

SSD-Net is a distinctive network architecture that excavates the full potential of single-scale features for image enhancement tasks. Its asymmetric decomposition mechanism, synergistic CNN–Transformer integration, and dynamic cross-branch communication allow it to rival and frequently surpass multi-scale methods in restoration accuracy, parameter economy, and computational efficiency. These advances suggest broader applicability across vision restoration domains and set the foundation for further development of efficient, single-scale neural networks.