Reversible Column Networks

Published 22 Dec 2022 in cs.CV | (2212.11696v3)

Abstract: We propose a new neural network design paradigm Reversible Column Network (RevCol). The main body of RevCol is composed of multiple copies of subnetworks, named columns respectively, between which multi-level reversible connections are employed. Such architectural scheme attributes RevCol very different behavior from conventional networks: during forward propagation, features in RevCol are learned to be gradually disentangled when passing through each column, whose total information is maintained rather than compressed or discarded as other network does. Our experiments suggest that CNN-style RevCol models can achieve very competitive performances on multiple computer vision tasks such as image classification, object detection and semantic segmentation, especially with large parameter budget and large dataset. For example, after ImageNet-22K pre-training, RevCol-XL obtains 88.2% ImageNet-1K accuracy. Given more pre-training data, our largest model RevCol-H reaches 90.0% on ImageNet-1K, 63.8% APbox on COCO detection minival set, 61.0% mIoU on ADE20k segmentation. To our knowledge, it is the best COCO detection and ADE20k segmentation result among pure (static) CNN models. Moreover, as a general macro architecture fashion, RevCol can also be introduced into transformers or other neural networks, which is demonstrated to improve the performances in both computer vision and NLP tasks. We release code and models at https://github.com/megvii-research/RevCol

Abstract PDF Upgrade to Chat

Citations (39)

View on Semantic Scholar

Summary

The paper introduces a reversible transformation design that links columnar subnetworks to progressively disentangle features while preserving complete information.
It demonstrates robust performance on benchmarks, with models achieving up to 90.0% accuracy on ImageNet-1K and leading results on COCO and ADE20K.
The architecture is flexible, enabling integration with transformers and other models to enhance performance in computer vision and NLP tasks while reducing memory usage.

Overview of Reversible Column Networks

The paper introduces a novel network paradigm termed Reversible Column Networks (RevCol), which aims to provide a new architectural design for neural networks with an emphasis on reversible and disentangled feature learning. The RevCol architecture combines subnetworks, referred to as columns, linked by reversible connections. This structural design promotes the gradual disentanglement of features through the columns, while maintaining the entirety of information rather than compressing or discarding it, which contrasts with conventional neural networks.

Core Contributions

RevCol's architecture is significant in several respects:

Reversible Transformation Design: The paper extends the concept of reversible networks, traditionally seen in architectures like RevNets, to a multi-level fusion context more closely aligned with the structural demands of column-based networks. This is achieved while addressing limitations in dimensional constraints and information preservation typical of existing reversible structures.
Feature Disentanglement: Through reversible pathways, the RevCol model strives to disentangle feature representations at different levels gradually. This disentanglement facilitates retaining both high-level semantic information and low-level details, which can be essential for various computer vision tasks.
Flexible Integration with Other Architectures: The RevCol design is adaptable and can be incorporated into transformers or other neural network models, enhancing their performance across diverse tasks in both computer vision and natural language processing.

Experimental Validation and Results

The efficacy of RevCol is highlighted through experiments across multiple computer vision benchmarks, including image classification on ImageNet, object detection on COCO, and semantic segmentation on ADE20K. Notable results include:

RevCol-XL achieves 88.2% accuracy on ImageNet-1K post-ImageNet-22K pre-training.
RevCol-H, with extensive pre-training data, achieves 90.0% ImageNet-1K accuracy, along with leading results on COCO detection and ADE20K segmentation benchmarks among static CNN models.

These numerical results underscore the competitive performance of RevCol architectures, affirming their potential for scalable and robust performance in large-scale computer vision applications.

Implications and Future Prospects

The introduction of RevCol offers significant implications for the architecture of neural networks:

Memory Efficiency: The reversible nature of the columns contributes to substantial memory savings during training, a crucial factor when dealing with large-scale models.
Scalability: By introducing the notion of multiple columns as a scaling dimension, RevCol models can achieve scalable depth and width, providing a new approach to model scaling.

Moving forward, the RevCol architecture could inspire further exploration into designing neural networks that balance feature richness and computational efficiency. Potential future developments could include more sophisticated reversible mechanisms, broader applications in NLP, and further integration with self-supervised learning frameworks to enhance the adaptability of learned representations. The RevCol design marks a promising step towards more robust and adaptable AI systems, emphasizing the importance of maintaining comprehensive feature representations throughout network layers.

Markdown Report Issue