RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition (2105.01883v3)

Published 5 May 2021 in cs.CV, cs.AI, and cs.LG

Abstract: We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers. Compared to convolutional layers, FC layers are more efficient, better at modeling the long-range dependencies and positional patterns, but worse at capturing the local structures, hence usually less favored for image recognition. We propose a structural re-parameterization technique that adds local prior into an FC to make it powerful for image recognition. Specifically, we construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. On CIFAR, a simple pure-MLP model shows performance very close to CNN. By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs. Our intriguing findings highlight that combining the global representational capacity and positional perception of FC with the local prior of convolution can improve the performance of neural network with faster speed on both the tasks with translation invariance (e.g., semantic segmentation) and those with aligned images and positional patterns (e.g., face recognition). The code and models are available at https://github.com/DingXiaoH/RepMLP.

Citations (82)

View on Semantic Scholar

Summary

The paper introduces a re-parameterization method that merges convolutional layers with fully-connected layers to leverage both local and global features.
The method combines convolution and batch normalization layers into a single FC layer during training for faster, more efficient inference.
Empirical tests on ImageNet, MegaFace, and Cityscapes demonstrate improved accuracy and reduced computational overhead.

Overview of RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

The paper introduces RepMLP, an innovative building block designed to address the limitations of fully-connected (FC) layers in image recognition tasks by re-parameterizing convolutional layers. The paper offers a novel perspective in the construction of neural network architectures by leveraging the strengths of both convolutional and fully-connected layers for improved image recognition.

RepMLP capitalizes on the representational capabilities of FC layers while incorporating the local prior inherent in convolutional neural networks (ConvNets). Typically, ConvNets excel at capturing local patterns and have been the preferred choice for image recognition tasks due to their translation invariance. However, FC layers possess a greater global capacity and ability to encode positional information, albeit without the local inductive bias.

Methodology

RepMLP utilizes a novel re-parameterization technique that allows for the integration of local priors within fully-connected layers. This is achieved by constructing convolutional and batch normalization (BN) layers during the training phase, which are subsequently merged into a single FC layer for more efficient inference. Such a method allows the model to harness the efficiency and global modeling capabilities of FC layers while maintaining the local sensitivity offered by convolutions.

The RepMLP architecture consists of several components:

Global Perceptron: Provides inter-partition dependencies, enhancing correlations among partitioned feature maps.
Partition Perceptron: Utilizes group-wise FC layers to model dependencies within partitions.
Local Perceptron: Comprising multiple convolutional layers, it reinforces local pattern recognition within the partitions.

By adjusting these components, RepMLP effectively amalgamates the hierarchical, locality-aware profiling of traditional ConvNets with the broader, more comprehensive perspective of FC layers.

Results and Implications

The empirical results demonstrate RepMLP's prowess across various computer vision tasks:

ImageNet Classification: RepMLP, when implemented within ResNet architectures, showed significant accuracy improvements with reduced computational overheads. For instance, incorporating RepMLP in ResNet-50 resulted in a 1.36% higher top-1 accuracy on ImageNet with lower FLOPs, compared to traditional ConvNets.
Face Recognition: Leveraging positional priors, the RepMLP-optimized architecture achieved over 95% accuracy on MegaFace, outperforming both the standard ResNet-based and MobileFaceNet baselines not only in accuracy but also in runtime efficiency.
Semantic Segmentation: RepMLP's adaptability excels in tasks with innate translation invariance, such as semantic segmentation on Cityscapes, where it improved the mean intersection-over-union (mIoU) by over 2% with faster inference speeds.

Theoretical and Practical Implications

The methodological approach underpinning RepMLP opens avenues for new architectures that can judiciously balance local specificity with global coverage—an approach that holds promise for directly addressing the computational challenges faced by large-scale vision models. The inherent flexibility of RepMLP suggests potential applications in diverse domains, including remote sensing, video analysis, and complex multi-modal learning systems. It also provides insights into architecture design that can steer further explorations in enhancing neural network generalization and efficiency.

Future Developments

Future research may focus on the exploration of RepMLP in conjunction with emerging all-MLP models and investigate further optimizations that can lead to even more profound efficiency gains. Additionally, extending the re-parameterization principles to other domains such as audio or LLMing can pave the way for broad-scale cross-modal applications.

In conclusion, the integration of RepMLP stands as a testament to the evolving landscape of neural network design, harnessing the symbiotic strengths of convolution and fully-connected layers. This work not only advances the domain of image recognition but also fosters a nuanced understanding of how spatial dependencies and global positional information can be deftly intertwined to create more robust and proficient neural networks.

PDF Markdown

Related Papers

GitHub

GitHub - DingXiaoH/RepMLP: RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality (CVPR 2022) (306 stars)