KBNet: Kernel Basis Network for Image Restoration (2303.02881v1)

Published 6 Mar 2023 in cs.CV

Abstract: How to aggregate spatial information plays an essential role in learning-based image restoration. Most existing CNN-based networks adopt static convolutional kernels to encode spatial information, which cannot aggregate spatial information adaptively. Recent transformer-based architectures achieve adaptive spatial aggregation. But they lack desirable inductive biases of convolutions and require heavy computational costs. In this paper, we propose a kernel basis attention (KBA) module, which introduces learnable kernel bases to model representative image patterns for spatial information aggregation. Different kernel bases are trained to model different local structures. At each spatial location, they are linearly and adaptively fused by predicted pixel-wise coefficients to obtain aggregation weights. Based on the KBA module, we further design a multi-axis feature fusion (MFF) block to encode and fuse channel-wise, spatial-invariant, and pixel-adaptive features for image restoration. Our model, named kernel basis network (KBNet), achieves state-of-the-art performances on more than ten benchmarks over image denoising, deraining, and deblurring tasks while requiring less computational cost than previous SOTA methods.

Citations (42)

View on Semantic Scholar

Summary

The paper introduces the Kernel Basis Attention module that adaptively aggregates spatial information using learnable kernel bases.
It develops a Multi-axis Feature Fusion block to merge channel-wise, spatial-invariant, and pixel-adaptive features for enhanced image restoration.
Evaluations demonstrate that KBNet outperforms transformer-based methods, achieving superior performance with significantly fewer MACs in tasks like denoising, deraining, and deblurring.

Overview of KBNet: Kernel Basis Network for Image Restoration

The paper "KBNet: Kernel Basis Network for Image Restoration" addresses key challenges in the field of image restoration, focusing on the aggregation of spatial information. It critiques previous CNN-based methods for their use of static convolutional kernels and highlights the heavy computational load of transformer-based architectures that attempt adaptive spatial aggregation. A novel solution, the Kernel Basis Attention (KBA) module is proposed to balance the trade-offs between adaptive spatial aggregation and computational efficiency.

Main Contributions

Kernel Basis Attention (KBA) Module: The KBA module introduces a set of learnable kernel bases designed to capture a range of representative image patterns for spatial information aggregation. Each kernel basis is trained to model distinct local structures, which are then linearly combined using pixel-wise coefficients to derive the aggregation weights at each spatial location. This design integrates the benefits of convolutional networks' inductive biases with the adaptability of transformers.
Multi-axis Feature Fusion (MFF) Block: Based on the KBA module, the MFF block is developed to fuse features across channel-wise, spatial-invariant, and pixel-adaptive domains. This fusion process utilizes parallel feature extraction methods, including channel attention and depthwise convolutions, combined through point-wise multiplication to enrich the feature encoding process for image restoration tasks.
Achievements in Computational Efficiency and Performance: The resulting Kernel Basis Network (KBNet) demonstrates state-of-the-art performance across multiple benchmarks for image denoising, deraining, and deblurring. Importantly, KBNet achieves these results with lower computational costs compared to existing methods, notably outperforming recent transformer-based approaches.

Technical Insights and Numerical Results

The proposed KBA module sidesteps the computational inefficiencies of direct kernel predictions by leveraging shared kernel bases that adjust adaptively to spatial contexts. A lightweight convolution branch optimizes linear combination coefficients, which are applied to these kernels, resulting in a more efficient and effective spatial aggregation mechanism.

Key numerical results indicate that KBNet consistently outperforms recent methods across standardized image datasets. For instance, in Gaussian image denoising tasks, KBNet reaches peak performance metrics while maintaining a reduced computational footprint. The computational savings are illustrated by a comparison of Multiply–Accumulate Operations (MACs), showing KBNet requires significantly fewer MACs while delivering superior image restoration quality.

Implications and Future Directions

The implications of KBNet's architecture extend beyond mere efficiency gains. By achieving a fine balance between flexibility and computation, KBNet sets a new standard for practical deployment scenarios—such as real-time image processing applications on constrained hardware systems.

Future work could explore further optimization of the kernel basis generation and fusion processes, potentially integrating aspects of other machine learning paradigms, such as ensemble learning or network pruning strategies, to push the boundaries of efficiency and portability. Additionally, expanding the concept to handle diverse image types and incorporating unsupervised learning techniques might offer new pathways for improvement in self-adapting image restoration models.

In conclusion, KBNet presents a significant step forward in marrying the flexible, adaptive benefits of transformers with the computational efficiencies of CNNs. It offers a compelling framework for future research aimed at refining spatial information aggregation in the image processing domain.

PDF Markdown

Related Papers

GitHub

GitHub - zhangyi-3/KBNet: KBNet: Kernel Basis Network for Image Restoration (212 stars)