MixConv: Mixed Depthwise Convolutional Kernels (1907.09595v3)

Published 22 Jul 2019 in cs.CV and cs.LG

Abstract: Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation, we propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection. To demonstrate the effectiveness of MixConv, we integrate it into AutoML search space and develop a new family of models, named as MixNets, which outperform previous mobile models including MobileNetV2 20, ShuffleNetV2 16, MnasNet 26, ProxylessNAS 2, and FBNet 27. In particular, our MixNet-L achieves a new state-of-the-art 78.9% ImageNet top-1 accuracy under typical mobile settings (<600M FLOPS). Code is at https://github.com/ tensorflow/tpu/tree/master/models/official/mnasnet/mixnet

Authors (2)

Mingxing Tan (46 papers)
Quoc V. Le (128 papers)

Citations (361)

View on Semantic Scholar

Summary

The paper introduces MixConv, a novel architecture replacing standard depthwise convolutions by mixing multiple kernel sizes for improved feature extraction.
It demonstrates significant performance gains, achieving a 78.9% top-1 accuracy on ImageNet and enhanced object detection on COCO using MobileNet-based architectures.
The integration with neural architecture search enables adaptable models optimized for mobile and edge devices with constrained computational resources.

Analysis of "MixConv: Mixed Depthwise Convolutional Kernels"

The paper introduces MixConv, a novel neural network component designed to improve the efficiency and accuracy of depthwise convolutions commonly used in convolutional neural networks (CNNs) such as MobileNets. Traditional depthwise convolutions utilize a single kernel size, often chosen without regard to its guaranteed effectiveness across varying tasks. This paper systematically analyzes the impact of multiple kernel sizes and proposes a method to integrate varied kernel sizes beneficially within a single convolutional module.

MixConv is a key innovation that leverages the advantages of various kernel sizes by partitioning channels into distinct groups, each processed with different kernel sizes. This approach enables the model to maintain the computational efficiency characteristic of depthwise convolutions while improving accuracy. The paper demonstrates the application of MixConv in the enhancement of existing MobileNet architectures, achieving notable improvements on benchmark datasets such as ImageNet for classification and COCO for object detection.

Key Contributions

MixConv Architecture: The paper proposes MixConv as a direct replacement for standard depthwise convolutions. By mixing kernel sizes within a single convolutional operation, MixConv can capture image features of different scales more effectively than a fixed kernel size approach.
Empirical Performance Improvement: Through empirical evaluations on the ImageNet dataset, MixConv shows an improvement in the top-1 accuracy for MobileNet-based architectures. Specifically, MixNet-L, an architecture employing MixConv, achieves top-1 accuracy of 78.9% on ImageNet, which is a significant advancement over previous models including MobileNetV2 and ShuffleNetV2.
Neural Architecture Search (NAS) Integration: The paper outlines the integration of MixConv into a NAS framework to further optimize convolutional architectures tailored specifically to mobile environments. The resulting MixNets significantly advance the state-of-the-art in mobile CNN performance, achieving superior results with constrained computational resources.
Transfer Learning Versatility: Beyond ImageNet, MixNets were evaluated on four additional datasets, demonstrating robust generalization capabilities. Such versatility is crucial for practical applications where models pre-trained on large datasets are adapted for specific tasks.

Implications and Future Work

The introduction of MixConv highlights a promising direction for enhancing the flexibility and efficacy of neural networks in resource-constrained environments. Its seamless integration as a drop-in for standard convolutions suggests it could be readily adopted in existing architectures, offering a pathway to more efficient deployments on mobile and edge devices.

Given the nature of MixConv, it may also inspire further explorations into hybrid designs that blend neural operations to mitigate the trade-offs commonly encountered between accuracy and computational demands. Future developments could include exploring adaptive methods to dynamically select kernel sizes based on input characteristics or employing advanced NAS methodologies to exploit MixConv's potential fully.

Overall, MixConv represents a significant step in the ongoing efforts to optimize neural networks for both performance and efficiency, ensuring they remain viable for an ever-expanding array of applications, particularly those involving mobile and real-time processing scenarios. The results and methodologies described could influence subsequent research directions focused on achieving high-performance, low-complexity neural network designs.

PDF Markdown

Related Papers

GitHub

GitHub: Let’s build from here · GitHub