Multi-level Wavelet Convolutional Neural Networks (1907.03128v1)

Published 6 Jul 2019 in cs.CV and eess.IV

Abstract: In computer vision, convolutional networks (CNNs) often adopts pooling to enlarge receptive field which has the advantage of low computational complexity. However, pooling can cause information loss and thus is detrimental to further operations such as features extraction and analysis. Recently, dilated filter has been proposed to trade off between receptive field size and efficiency. But the accompanying gridding effect can cause a sparse sampling of input images with checkerboard patterns. To address this problem, in this paper, we propose a novel multi-level wavelet CNN (MWCNN) model to achieve better trade-off between receptive field size and computational efficiency. The core idea is to embed wavelet transform into CNN architecture to reduce the resolution of feature maps while at the same time, increasing receptive field. Specifically, MWCNN for image restoration is based on U-Net architecture, and inverse wavelet transform (IWT) is deployed to reconstruct the high resolution (HR) feature maps. The proposed MWCNN can also be viewed as an improvement of dilated filter and a generalization of average pooling, and can be applied to not only image restoration tasks, but also any CNNs requiring a pooling operation. The experimental results demonstrate effectiveness of the proposed MWCNN for tasks such as image denoising, single image super-resolution, JPEG image artifacts removal and object classification.

Citations (182)

View on Semantic Scholar

Summary

The paper introduces an innovative CNN architecture integrating wavelet transforms to enhance image restoration and maintain spatial details.
It replaces traditional pooling with discrete wavelet transforms, thereby expanding the receptive field without compromising efficiency.
Experimental results show improved PSNR and SSIM across denoising, super-resolution, and artifact removal tasks over existing methods.

Multi-level Wavelet Convolutional Neural Networks

This paper introduces an innovative approach to enhancing convolutional neural networks (CNNs) by integrating multi-level wavelet transforms, resulting in an architecture named Multi-level Wavelet Convolutional Neural Networks (MWCNNs). The core motivation behind this development is to address the limitations inherent in traditional CNNs, particularly the trade-offs between receptive field enlargement and computational efficiency, as well as the information loss incurred by pooling operations.

Overview of MWCNN

The architecture of MWCNN embeds wavelet transforms within a CNN framework, leveraging the properties of wavelet decomposition to maintain spatial resolution while enlarging the receptive field. This approach is inspired by the multi-level wavelet packet transform (WPT), which facilitates decomposition and reconstruction without loss of information, thanks to its biorthogonal property. The proposed model can be perceived as an evolution where CNN blocks are injected within the WPT framework, enhancing feature representation by utilizing both low and high-frequency subbands.

MWCNN is particularly designed for tasks such as image restoration, including super-resolution, denoising, and artifact removal. It is built upon the U-Net architecture, where the downsampling operation traditionally performed by pooling is substituted with discrete wavelet transforms (DWT), thus preserving image details throughout the process. This method showcases significant improvements in performance across several image restoration tasks, as evidenced by comprehensive experimental results.

Numerical Results and Contributions

The paper presents substantial numerical evidence indicating that the proposed MWCNN outperforms existing CNN architectures. For image denoising tasks across different noise levels, MWCNN consistently achieves higher PSNR and SSIM values compared to renowned methods such as BM3D and DnCNN. Similarly, in the field of single image super-resolution (SISR), MWCNN demonstrates superiority over methods like VDSR and DRRN, particularly noticeable on complex datasets such as Urban100. Moreover, MWCNN excels in eliminating JPEG compression artifacts, again outperforming other architectures, including ARCNN and DnCNN.

Theoretical and Practical Implications

There are several theoretical implications of the MWCNN approach. By integrating wavelet transforms into the CNN design, the model captures multi-frequency information that significantly aids in texture detail preservation, a critical aspect for high-quality image restoration. The invertibility of wavelet transforms ensures that no information is lost during the downsampling and upsampling processes. Additionally, the model reduces computational burden while still expanding the receptive field, which is especially beneficial in large-scale applications.

Practically, MWCNN can be employed across a wide array of CNN-based applications beyond image restoration, such as object classification. The modification to use DWT for feature downscaling can be a plug-and-play improvement for any CNN employing pooling layers. As such, this adaptation can contribute to more robust feature extraction without necessitating substantial alterations to existing network architectures.

Future Directions

The paper hints at future explorations that could extend the application of MWCNN further into other high-level vision tasks, such as object detection and image segmentation. These tasks typically involve dense prediction, where preserving fine-scale features is paramount. Thus, utilizing MWCNN's capacity for detailed feature retention may prove advantageous. Additionally, further architectural innovations may refine MWCNN's capability, promoting its applicability to blind restoration tasks and other domains requiring advanced feature extraction techniques.

In summary, the integration of multi-level wavelet transformations into CNN architecture, as proposed in MWCNN, presents a significant step forward in balancing performance and computational efficiency while addressing information retention challenges in vision tasks. The promising results warrant continued exploration and adaptation in various AI-related applications.

PDF Markdown