FractalNet: Ultra-Deep Neural Networks without Residuals (1605.07648v4)

Published 24 May 2016 in cs.CV

Abstract: We introduce a design strategy for neural network macro-architecture based on self-similarity. Repeated application of a simple expansion rule generates deep networks whose structural layouts are precisely truncated fractals. These networks contain interacting subpaths of different lengths, but do not include any pass-through or residual connections; every internal signal is transformed by a filter and nonlinearity before being seen by subsequent layers. In experiments, fractal networks match the excellent performance of standard residual networks on both CIFAR and ImageNet classification tasks, thereby demonstrating that residual representations may not be fundamental to the success of extremely deep convolutional neural networks. Rather, the key may be the ability to transition, during training, from effectively shallow to deep. We note similarities with student-teacher behavior and develop drop-path, a natural extension of dropout, to regularize co-adaptation of subpaths in fractal architectures. Such regularization allows extraction of high-performance fixed-depth subnetworks. Additionally, fractal networks exhibit an anytime property: shallow subnetworks provide a quick answer, while deeper subnetworks, with higher latency, provide a more accurate answer.

Citations (914)

View on Semantic Scholar

Summary

The paper introduces FractalNet, a recursive architecture that forgoes residual connections while maintaining efficient gradient propagation.
It demonstrates competitive accuracy with ResNets on datasets such as CIFAR-10, CIFAR-100, and ImageNet through innovative drop-path regularization.
The findings challenge the conventional reliance on residual connections, offering a simpler, yet effective design for ultra-deep neural networks.

FractalNet: Ultra-Deep Neural Networks without Residuals

In the paper "FractalNet: Ultra-Deep Neural Networks without Residuals" by Gustav Larsson, Michael Maire, and Gregory Shakhnarovich, the authors introduce a novel approach to the design of deep neural network architectures based on self-similarity principles. This approach leads to the creation of networks with fractal structures, termed FractalNets, which are capable of achieving competitive performance levels without leveraging residual connections, typically considered essential in ultra-deep networks.

Key Contributions

The paper's primary contributions are:

Fractal Network Architecture: The introduction of FractalNet, a network design that recursively applies a simple expansion rule to generate deeper configurations that manifest as truncated fractals. Notably, unlike Residual Networks (ResNets), FractalNets eschew any form of pass-through or residual connections. Each signal internal to the network is fully transformed before reaching subsequent layers, disallowing identity mappings.
Performance Analysis: Empirical evaluation shows that FractalNet matches the performance of ResNets on standard image classification tasks, such as CIFAR-10, CIFAR-100, and ImageNet datasets. This outcome is significant because it suggests that residual learning may not be a necessary component for effective training of extremely deep convolutional neural networks.
Regularization via Drop-Path: The authors elaborate on a new regularization technique dubbed "drop-path," which complements dropout by randomly disabling macro-scale components of a FractalNet during training. This not only regularizes the model by preventing co-adaptation of parallel paths but also enables the extraction of high-performance fixed-depth subnetworks.

FractalNet Design and Implementation

A FractalNet is constructed using an iterative rule that defines layers recursively, increasing both width (number of intertwined columns) and depth (number of convolutional layers) exponentially with each application. The design does not enforce any predefined identity mappings but instead employs join layers that compute element-wise means of their inputs, differentiating it fundamentally from ResNets which rely heavily on identity mappings.

Experimental Results

CIFAR-100 and CIFAR-10:

Without data augmentation, FractalNet exhibits a substantial reduction in error rates compared to unregularized and stochastically regularized ResNets, demonstrating its robustness in less-than-ideal data conditions.
With standard data augmentation, FractalNet’s performance is on par with or slightly better than various ResNet configurations and other contemporary architectures.

SVHN:

FractalNet and its associated regularization strategies also perform strongly on the SVHN dataset, reinforcing its versatility.

ImageNet:

FractalNet scales effectively to ImageNet, maintaining comparable accuracy levels to ResNet-34, which further validates the efficacy of the fractal architecture in large-scale image recognition tasks.

Insights and Implications

The paper suggests that the effective training of deep networks could fundamentally depend more on ensuring shorter paths for gradient propagation rather than relying on residual connections. This is evidenced by the FractalNet’s performance and the successful extraction of efficient subnetworks via the drop-path regularization.

Theoretical and Practical Impact

The fractal architecture offers a straightforward design paradigm that encapsulates many successful practices in neural network construction implicitly rather than explicitly. The simplicity in forming FractalNet structures stands in contrast to the intricate engineering approaches found in modules like those in Inception or residual blocks in ResNets. The demonstrated ability to train ultra-deep networks without residuals pushes forward the understanding of deep network functionality and opens avenues for new design strategies that could benefit various AI domains, from computer vision to natural language processing.

Future Directions

Further research could explore hybrid architectures combining fractal principles with other architectural innovations, investigate the interplay between fractal depth and other regularization techniques, and extend the application of fractal architectures to a wider range of machine learning tasks beyond the scope of image classification.

This paper challenges established notions about the necessity of residual connections in deep network training and introduces a compelling alternative that merits attention from the broader research community.

PDF Markdown

Related Papers

YouTube

Show All Videos