PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer (2007.06191v1)

Published 13 Jul 2020 in cs.CV

Abstract: Despite their strong modeling capacities, Convolutional Neural Networks (CNNs) are often scale-sensitive. For enhancing the robustness of CNNs to scale variance, multi-scale feature fusion from different layers or filters attracts great attention among existing solutions, while the more granular kernel space is overlooked. We bridge this regret by exploiting multi-scale features in a finer granularity. The proposed convolution operation, named Poly-Scale Convolution (PSConv), mixes up a spectrum of dilation rates and tactfully allocate them in the individual convolutional kernels of each filter regarding a single convolutional layer. Specifically, dilation rates vary cyclically along the axes of input and output channels of the filters, aggregating features over a wide range of scales in a neat style. PSConv could be a drop-in replacement of the vanilla convolution in many prevailing CNN backbones, allowing better representation learning without introducing additional parameters and computational complexities. Comprehensive experiments on the ImageNet and MS COCO benchmarks validate the superior performance of PSConv. Code and models are available at https://github.com/d-li14/PSConv.

Citations (27)

View on Semantic Scholar

Summary

The paper introduces PSConv to efficiently embed multi-scale feature extraction into a single convolutional layer.
It employs a cyclic allocation of dilation rates to fuse coarse and fine-grained features without increasing computational complexity.
Experiments reveal significant performance gains on ImageNet and MS COCO, reducing top-1 error and enhancing object detection.

Summary of "PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer"

The paper introduces Poly-Scale Convolution (PSConv), a novel approach to improve the extraction of multi-scale features within Convolutional Neural Networks (CNNs) without increasing the computational burden typically associated with such enhancements. Existing convolutional architectures are often limited by scale-sensitive challenges due to fixed receptive field sizes. The PSConv technique mitigates these limitations by leveraging a spectrum of dilation rates across convolutional kernels, effectively embedding scale variation capabilities directly into a single convolutional layer.

Core Contribution

PSConv presents a convolutional operation that cyclically alternates dilation rates along both input and output channel dimensions within a convolutional layer, which distinguishes it from previous approaches that primarily manipulate convolutional features in a layer-wise or filter-wise manner. The variation in dilation rates fills the channel dimensions in a cyclic manner, enabling the fusion of multi-scale features while avoiding the complexity and overhead associated with layer or network modifications.

Detailed Methodology

The methodology focuses on a cyclic allocation of dilation rates across kernels within convolutional filters. Channels are divided into partitions, with a fixed pattern of dilation rates effectively repeated, creating a fine-granular kernel lattice where features across multiple scales are efficiently aggregated. This allows for simultaneous coarse and fine-grain feature extraction, thus enhancing the network's capability to handle scale variance in visual data better.

The authors differentiate PSConv from similar concepts such as MixConv, by explaining that their approach specifically targets the dilated convolution spectrum and focuses on maintaining kernel size consistency to minimize parameter and computational overhead. PSConv applies this design across various network architectures, including ResNet, ResNeXt, and SE-ResNet, showcasing its versatility.

Experimental Results

Substantial experimentation on the ImageNet dataset demonstrates the efficacy of PSConv. The incorporation of PSConv shows consistent improvement in top-1 and top-5 error rates compared to the vanilla architectures. For example, PS-ResNet-50 achieves a notable reduction in top-1 error to 21.126%, equivalent in performance to deeper networks with a fraction of the computational demands.

The effectiveness of PSConv extends beyond image classification to dense prediction tasks such as object detection and instance segmentation, verified via experiments on the MS COCO 2017 dataset. Here, PSConv-based backbones integrated into frameworks like Faster R-CNN and Mask R-CNN demonstrate notable improvements in average precision (AP), particularly for objects across varying sizes.

Implications and Future Work

The introduction of PSConv has practical implications in advancing neural network design for tasks sensitive to scale variance without necessitating complex architectural changes. The plug-and-play nature of PSConv facilitates its integration into existing models, offering a path to improve performance benchmarks across various domains within computer vision.

Looking forward, the exploration of automated dilation rate learning, possibly through dynamic architectures or more sophisticated heuristics, could further refine PSConv’s capability. Additionally, optimizations in computational efficiency may broaden the scope of PSConv applications, especially in real-time or resource-constrained environments.

In summary, PSConv represents a technically innovative approach to enhancing CNN scalability and feature representation, promising to influence the development of more adaptive and scalable AI systems in the future.

PDF Markdown

Related Papers

GitHub

GitHub - d-li14/PSConv: [ECCV 2020] PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer (174 stars)