Wide Residual Networks (1605.07146v4)

Published 23 May 2016 in cs.CV, cs.LG, and cs.NE

Abstract: Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layer-deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https://github.com/szagoruyko/wide-residual-networks

Authors (2)

Sergey Zagoruyko (17 papers)
Nikos Komodakis (37 papers)

Citations (7,596)

View on Semantic Scholar

Summary

The paper demonstrates that widening residual blocks, rather than increasing depth, significantly improves training speed and accuracy.
It systematically evaluates various ResNet configurations, showing that a 16-layer WRN can outperform a 1000-layer ResNet on key datasets.
The study highlights the effective integration of dropout in WRNs, reducing overfitting and achieving state-of-the-art results on benchmarks such as SVHN and ImageNet.

Wide Residual Networks

The paper, authored by Sergey Zagoruyko and Nikos Komodakis, presents an in-depth investigation into the architecture of deep residual networks (ResNets), resulting in the proposal of Wide Residual Networks (WRNs). While traditional residual networks achieved state-of-the-art results by increasing depth, this paper explores enhancing performance by increasing the width instead.

Motivation and Background

Deep convolutional neural networks (CNNs) have evolved significantly, with architectures like AlexNet, VGG, Inception, and ResNet progressively increasing complexity to solve image recognition tasks. Despite the efficacy of deep networks, they encounter issues such as diminishing feature reuse and computational inefficiencies. The introduction of residual connections mitigated some challenges like vanishing gradients, facilitating the training of networks with a depth of over a thousand layers. Nevertheless, the incremental performance gains achieved by such extreme depths come at prohibitive computational costs.

Key Contributions

Experimental Study on ResNet Architectures: The authors meticulously examined different ResNet block configurations to understand their impact on performance. The paper included varying the depth and width of the networks and employing different types of convolutions within the blocks.
Proposal of Wide Residual Networks (WRNs): By widening the ResNet blocks—expanding the number of feature layers instead of adding more layers—the authors demonstrated significant improvements in both training speed and test accuracy. For instance, a 16-layer WRN was shown to outperform a 1000-layer thin ResNet in accuracy and efficiency.
Dropout Implementation in WRNs: The paper further explored incorporating dropout within the residual blocks to prevent overfitting. This adjustment was shown to offer consistent accuracy gains across various datasets.
State-of-the-Art Results: The authors’ experiments yielded new state-of-the-art results on multiple datasets such as CIFAR, SVHN, and COCO, along with substantial improvements in ImageNet benchmarks.

Numerical Results

The effectiveness of WRNs was demonstrated through a series of benchmark tests:

CIFAR-10 and CIFAR-100: WRNs achieved significant reductions in error rates compared to their thin counterparts. For example, WRN-28-10 achieved a 4.00% error rate on CIFAR-10 and 19.25% on CIFAR-100, outperforming much deeper models like ResNet-1001.
SVHN: The inclusion of dropout in WRNs led to an impressive error rate of 1.54%, outperforming existing models.
ImageNet: WRN-50-2-bottleneck outperformed ResNet-152 with a top-1 error rate of 21.9%, demonstrating that even with fewer layers, wider networks can achieve superior performance.

Practical and Theoretical Implications

The paper underscores the importance of balancing width and depth in residual networks. It challenges the notion that extreme depth is necessary for achieving high performance, highlighting that width can provide similar or better gains at a much lower computational cost. This insight has far-reaching implications for the design of efficient and powerful neural networks, especially for applications constrained by computational resources.

Additionally, the success of integrating dropout within WRNs emphasizes the merit of regularization techniques in enhancing model robustness. These findings suggest avenues for further research in optimizing residual architectures, potentially influencing ongoing advancements in neural network design.

Future Developments

Future research may focus on:

Extending the application of WRNs to more complex and diverse datasets to validate their versatility.
Exploring automated methods for determining the optimal width and depth configurations to streamline the architecture design process.
Investigating other regularization techniques alongside dropout to further mitigate overfitting in wide network architectures.

Conclusion

Sergey Zagoruyko and Nikos Komodakis' paper on Wide Residual Networks represents a significant step forward in residual network research. By addressing the limitations associated with extreme network depths and highlighting the benefits of increased width, the paper offers valuable insights and methodologies for developing more efficient and powerful neural networks. These contributions lay the groundwork for future innovations in the field, potentially leading to the widespread adoption of WRNs in both academic and industrial applications.

PDF Markdown

Related Papers

GitHub

GitHub - szagoruyko/wide-residual-networks: 3.8% and 18.3% on CIFAR-10 and CIFAR-100 (1,295 stars)

Tweets

https://twitter.com/jxbz/status/1820974089168941555

YouTube

Show All Videos