U-shaped Vision Mamba for Single Image Dehazing (2402.04139v4)

Published 6 Feb 2024 in cs.CV

Abstract: Currently, Transformer is the most popular architecture for image dehazing, but due to its large computational complexity, its ability to handle long-range dependency is limited on resource-constrained devices. To tackle this challenge, we introduce the U-shaped Vision Mamba (UVM-Net), an efficient single-image dehazing network. Inspired by the State Space Sequence Models (SSMs), a new deep sequence model known for its power to handle long sequences, we design a Bi-SSM block that integrates the local feature extraction ability of the convolutional layer with the ability of the SSM to capture long-range dependencies. Extensive experimental results demonstrate the effectiveness of our method. Our method provides a more highly efficient idea of long-range dependency modeling for image dehazing as well as other image restoration tasks. The URL of the code is \url{https://github.com/zzr-idam/UVM-Net}. Our method takes only \textbf{0.009} seconds to infer a $325 \times 325$ resolution image (100FPS) without I/O handling time.

Citations (27)

View on Semantic Scholar

Summary

The paper introduces UVM-Net, a novel U-shaped architecture that combines CNNs with state space models to efficiently dehaze images.
The paper achieves outstanding results with a PSNR of 40.17 and an SSIM of 0.996 on standard dehazing benchmarks.
The paper highlights the potential for extending its hybrid model to other vision tasks like de-raining and low-light enhancement.

U-shaped Vision Mamba for Single Image Dehazing: An Expert Review

The paper "U-shaped Vision Mamba for Single Image Dehazing" by Zhuoran Zheng and Chen Wu presents a novel approach to image dehazing, addressing the challenges posed by the computational complexity of existing Transformer models. The authors introduce UVM-Net (U-shaped Vision Mamba Network) as an efficient alternative designed for resource-constrained environments, achieving superior performance by integrating Convolutional Neural Networks (CNNs) with State Space Models (SSMs).

Methodological Advancements

The UVM-Net architecture utilizes a U-shaped network structure, which is analogous to the well-established U-Net framework. The critical aspect of UVM-Net is the introduction of the Bi-SSM block. This block combines the localized feature extraction efficiency of CNN with the long-range dependency modeling capabilities of SSMs. Such a combination addresses the quadratic complexity problem of the self-attention mechanism in Transformer models when dealing with high-resolution hazy images.

The innovation lies specifically in the Bi-SSM module, which processes feature maps across the channel dimension to exploit SSM's long-range modeling strengths. This approach differs from other formulations like U-Mamba and Mamba-UNet, which primarily focus on another dimension of the feature map. The UVM-Net configuration involves an encoder-decoder setup that facilitates both local and global feature capture with improved computational efficiency.

Experimental Results

The authors conducted extensive experiments using prominent image dehazing datasets such as RESIDE, demonstrating that UVM-Net outperforms conventional methods in terms of PSNR and SSIM metrics. Table \ref{tab:quantitative} in the paper provides a detailed comparison across different datasets. Notably, UVM-Net achieves a PSNR of 40.17 and an SSIM of 0.996 on the SOTS-indoor dataset, which represents a high level of image clarity and structural similarity to haze-free images. These numerics underscore the efficacy of UVM-Net in dealing with complex atmospheric phenomena impacting image quality.

Broader Implications and Future Prospects

The development of UVM-Net has significant implications for practical applications, particularly in fields requiring high-quality imaging under challenging conditions, such as autonomous driving, surveillance, and remote sensing. Furthermore, the architecture sets a precedent for exploring hybrid models that leverage both CNNs and advanced state space approaches for image restoration tasks beyond dehazing, including de-raining and low-light enhancement.

Theoretically, this work suggests a promising direction in deep learning research, advocating for architectures that eschew reliance on computationally expensive mechanisms like full-scale attention. The incorporation of Bi-SSM not only enhances computational viability but also paves the way for advancements in handling long-range dependencies in a variety of sequential data tasks.

Speculation on Future Developments

Future exploration could involve the adaptation of UVM-Net for three-dimensional data, expanding its applications to volumetric medical imaging and video enhancement systems. Additionally, further refinement of SSMs within UVM-Net could lead to even more efficient models capable of real-time performance on modest hardware systems, crucial for decentralizing high-performance vision models to edge devices.

Overall, UVM-Net represents a compelling leap forward in image restoration, balancing performance and efficiency, with its structural innovations providing a platform for continued research and application across diverse domains.