Deep Isometric Learning for Visual Recognition (2006.16992v2)

Published 30 Jun 2020 in cs.CV

Abstract: Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance. This paper shows that deep vanilla ConvNets without normalization nor skip connections can also be trained to achieve surprisingly good performance on standard image recognition benchmarks. This is achieved by enforcing the convolution kernels to be near isometric during initialization and training, as well as by using a variant of ReLU that is shifted towards being isometric. Further experiments show that if combined with skip connections, such near isometric networks can achieve performances on par with (for ImageNet) and better than (for COCO) the standard ResNet, even without normalization at all. Our code is available at https://github.com/HaozhiQi/ISONet.

Authors (5)

Haozhi Qi (22 papers)
Chong You (35 papers)
Xiaolong Wang (243 papers)
Yi Ma (189 papers)
Jitendra Malik (211 papers)

Citations (49)

View on Semantic Scholar

Summary

The paper introduces ISONets that leverage delta initialization and orthogonal regularization to preserve isometry in deep convolutional layers.
The modified SReLU nonlinearity dynamically balances activation to maintain isometric flow throughout the network.
Experimental results show that these networks achieve competitive classification performance without relying on normalization or skip connections.

Deep Isometric Learning for Visual Recognition

The paper "Deep Isometric Learning for Visual Recognition," authored by Haozhi Qi et al., presents an exploration into the design of deep convolutional neural networks (ConvNets) that do not rely on normalization techniques or skip connections, two cornerstone features in contemporary neural network architectures. The paper postulates that the isometric property, which ensures that a network layer preserves the inner product during both forward and backward propagation, can serve as a principal design framework for training deep ConvNets effectively.

Core Contributions

The authors introduce Isometric Networks (ISONets), which are configured by initializing convolutional layers to be near isometric through delta initialization and maintaining this isometry during training with orthogonal regularization. The ReLU non-linearity is modified to a Shifted ReLU (SReLU), facilitating a closer approximation to isometric behavior by adjusting the non-linearity dynamically across network layers via a learnable parameter. The results suggest that this adherence to isometric properties across layers allows a vanilla network, devoid of the typical architectural components like normalization layers or residual connections, to achieve competitive performance on standard image classification benchmarks such as ImageNet.

Key Insights

Isometry in Convolutional Layers: The approach begins with initializing convolution kernels using the delta method, offering a simple orthogonal setup that facilitates the starting of training from an isometric point. The orthogonal regularization applied during training help maintain close-to-isometric properties.
Nonlinearity through SReLU: The proposed SReLU introduces a learnable offset to the conventional ReLU, balancing nonlinearity and isometric flow, drawing a unique aspect by dynamically adjusting non-linearity to suit isometric constraints better.
Residual Connection Adaptation: The inclusion of residual connections enhances isometric properties further, evolving to R-ISONet, which achieves performances on par with standard ResNet architectures but without the need for normalization.

Comparative Evaluation

Experiments demonstrate that ISONets maintain competitive performance without relying on normalization or skip connections. With more than one hundred layers, these networks provide robust results in image classification tasks, making them a promising alternative for environments constrained by computational overhead associated with batch normalization and other typical deep network processes. For tasks such as object detection on the COCO dataset, R-ISONet even surpasses ResNet when utilized as a feature extractor, suggesting superior transfer learning capabilities.

Implications and Future Developments

The implications of this research are multifold. Primarily, it supports the prospect of designing simplified deep networks that forgo complex components by adhering to an isometric learning principle. This work provides a foundational approach for situations where batch normalization is infeasible and offers an avenue for reducing inference costs in resource-constrained applications.

For future developments, the concept of isometric learning could be extended to other architectural variants and might see extrapolation to Novice architectures beyond vision, such as LLMing and reinforcement learning. Further exploration into automatic differentiation and advanced optimization techniques could refine the isometric constraints for broader applicability across diverse neural network structures.

In summation, this paper charts a course that emphasizes the prospect of minimalistic yet effective network designs, spearheaded by a core principle of isometry, expanding the toolkit for neural network architecture design.

PDF Markdown

Related Papers

GitHub

GitHub - HaozhiQi/ISONet: Deep Isometric Learning for Visual Recognition (ICML 2020) (143 stars)

Tweets

https://twitter.com/xiaolonw/status/1278148208317706240

https://twitter.com/PapersTrending/status/1279354717446066176

https://twitter.com/PapersTrending/status/1279717150685609984

https://twitter.com/PapersTrending/status/1280079547384377344

https://twitter.com/PapersTrending/status/1278992297212993537