DiracNets: Training Very Deep Neural Networks Without Skip-Connections (1706.00388v2)

Published 1 Jun 2017 in cs.CV

Abstract: Deep neural networks with skip-connections, such as ResNet, show excellent performance in various image classification benchmarks. It is though observed that the initial motivation behind them - training deeper networks - does not actually hold true, and the benefits come from increased capacity, rather than from depth. Motivated by this, and inspired from ResNet, we propose a simple Dirac weight parameterization, which allows us to train very deep plain networks without explicit skip-connections, and achieve nearly the same performance. This parameterization has a minor computational cost at training time and no cost at all at inference, as both Dirac parameterization and batch normalization can be folded into convolutional filters, so that network becomes a simple chain of convolution-ReLU pairs. We are able to match ResNet-1001 accuracy on CIFAR-10 with 28-layer wider plain DiracNet, and closely match ResNets on ImageNet. Our parameterization also mostly eliminates the need of careful initialization in residual and non-residual networks. The code and models for our experiments are available at https://github.com/szagoruyko/diracnets

Citations (115)

View on Semantic Scholar

Summary

The paper introduces Dirac weight parameterization, enabling training of very deep neural networks without explicit skip-connections.
Experimental results show that a 28-layer DiracNet matches the accuracy of a 1001-layer ResNet on CIFAR-10, proving its parameter efficiency.
The approach simplifies network architectures by eliminating complex initialization, offering a computationally efficient alternative with competitive ImageNet performance.

DiracNets: Training Very Deep Neural Networks Without Skip-Connections

This paper, authored by Sergey Zagoruyko and Nikos Komodakis, addresses the prevalent paradigm of employing skip-connections in deep neural networks, such as ResNet, and introduces an alternative: DiracNets. The central motivation emerges from observations that the purported advantages of skip-connections, mainly facilitating the training of deeper networks, are actually derived more from increased capacity than sheer depth. Hence, the authors propose a novel Dirac weight parameterization, enabling the training of profoundly deep networks sans explicit skip-connections while maintaining near-parity in performance to skip-connected counterparts.

Dirac Parameterization and Network Architecture

DiracNets hinge upon the Dirac weight parameterization, a method inspired by the identity property in convolutional operators. Essentially, it represents the weights as a residual of the Dirac function, implicitly facilitating information propagation similar to the explicit skips found in ResNet. This formulation introduces negligible computational overhead during training and none during inference, as it can be integrated into convolution filters alongside batch normalization. Consequently, the network reduces to a minimalistic convolution-ReLU chain.

The DiracNet architecture borrows structural elements from both VGG and ResNet, comprising groups of convolutional layers sans the skip-connections. This design not only eliminates the traditionally required careful initialization but also supports training of networks exceeding hundreds of layers, all while using straightforward training regimens.

Experimental Evaluation

The experimental results are compelling, showcasing DiracNet's ability to achieve competitive performance. In CIFAR-10, a 28-layer DiracNet manages to match the 1001-layer ResNet's accuracy, highlighting its parameter efficiency. Although DiracNet demonstrates slight underperformance compared to the Wide ResNet (WRN) in CIFAR, it closely matches WRN's accuracy with an analogous parameter count. On ImageNet, DiracNet's performance parallels ResNet, showing only marginal deviations in both top-1 and top-5 error rates.

Analysis and Implications

The analysis of scaling coefficients in DiracNet provides insights into the dynamic importance of network layers during training, with distinct patterns emerging across varying depths. Notably, the Dirac parameterization simplifies architectures and can enhance residual networks by removing the necessity of exacting weight initialization—demonstrating robustness across various initialization scales.

The implications of DiracNets are substantial for both the practical and theoretical aspects of neural network design. Practically, this approach offers a pathway to more computationally efficient networks without compromising accuracy. Theoretically, it challenges the perceived indispensability of skip-connections, suggesting depth alone does not equate to higher performance—a nuance worthy of further exploration.

Future Directions

The introduction of DiracNets brings about potential paths for deeper inquiry, especially in understanding the interplay between network depth and width under different parameterizations. Further research could focus on regularization strategies to enhance performance on smaller datasets and dissect the stabilizing factors that prevent overfitting despite an abundance of parameters.

In conclusion, DiracNets present a viable alternative to skip-connection-laden architectures, emphasizing simplicity and efficiency while maintaining performance integrity. This exploration into alternative network parameterizations expands the toolkit available to neural network researchers and practitioners, prompting reconsideration of architectural norms in deep learning.

PDF Markdown