Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selective Kernel Networks (1903.06586v2)

Published 15 Mar 2019 in cs.CV

Abstract: In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiang Li (1003 papers)
  2. Wenhai Wang (123 papers)
  3. Xiaolin Hu (97 papers)
  4. Jian Yang (505 papers)
Citations (1,840)

Summary

Selective Kernel Networks

The paper "Selective Kernel Networks" by Xiang Li et al. introduces a novel approach to improving the architecture of Convolutional Neural Networks (CNNs) by drawing inspiration from the neuroscience findings on the receptive fields (RFs) of neurons in the visual cortex. The central innovation of this work is the incorporation of a dynamic selection mechanism which allows each neuron to adaptively adjust its receptive field size based on multi-scale input information.

Methodology

The authors propose a new building block called the Selective Kernel (SK) unit, which leverages multiple branches with distinct kernel sizes. The essence of SK units lies in their ability to adapt by using softmax attention guided by information in these branches. The Selective Kernel (SK) convolution consists of three main computational steps: Split, Fuse, and Select. Each SK unit generates multiple paths with different kernel sizes (Split), aggregates them to create a global representation (Fuse), and employs a soft attention mechanism to adaptively combine these paths (Select).

The SK units are then stacked to form deep networks referred to as Selective Kernel Networks (SKNets). These networks are evaluated on popular benchmarks including ImageNet and CIFAR-10/100, demonstrating superior performance over existing state-of-the-art architectures with the added benefit of lower model complexity.

Empirical Results

The empirical evaluation shows that SKNets significantly outperform other models. For instance, the SKNet-50 achieves a top-1 error rate of 20.79% on the ImageNet dataset, outperforming the ResNeXt-50 which has a top-1 error rate of 22.23%. Similarly, SKNet-101 with an even better top-1 error rate of 20.19% is shown to be more efficient than its counterparts, including SENet-101 and DPN-92, demonstrating the efficiency of adaptive RF sizes. Additionally, SKNets generalize well across various model complexities, providing state-of-the-art performance within the field of lightweight models such as ShuffleNets.

Detailed Analyses

The analyses of attention weights provide substantial insights into the model’s efficiency. By enlarging target objects within input images, the paper shows that neurons in SKNets adapt their RF sizes dynamically, in alignment with the sizes of target objects. Such behavior indicates that the neurons are collecting multi-scale spatial information dynamically. The authors observe that the importance of kernel selection is more prominent in lower and middle layers of the network. High-level layers exhibit relatively fixed attention patterns independent of the object size, which suggests that the lower layers are crucial for adaptive RF adjustment, while higher layers encode higher-level features.

Theoretical and Practical Implications

The theoretical implications of this research lie in its novel architectural design inspired by biological neural networks. The paper emphasizes the importance of adaptive receptive fields, a characteristic of biological neurons yet underexploited in artificial neural networks. Practically, the proposed SKNet architecture offers a significant improvement in performance for numerous applications, from large-scale image classification tasks to resource-constrained environments requiring compact models.

Future Directions

Future research can build upon this work by exploring several areas. One potential direction could involve integrating the SK mechanism into other types of neural networks beyond CNNs, such as recurrent neural networks (RNNs) or transformers, to investigate whether adaptive receptive fields can similarly enhance those architectures. Another avenue could explore more sophisticated methods for the Select operator, potentially involving reinforcement learning or even neuro-inspired algorithms to determine the optimal receptive field dynamically. Furthermore, employing SKNets in tasks beyond image recognition, such as object detection, segmentation, or even NLP tasks, could expand the applicability of this adaptive mechanism.

By incorporating adaptive receptive fields, "Selective Kernel Networks" addresses fundamental limitations in existing CNN architectures, yielding models that are both efficient and powerful, providing a meaningful step forward in the design of artificial neural networks. This paper has poised itself as a key contribution, laying the groundwork for more adaptive and biologically inspired neural network architectures.