Non-deep Networks (2110.07641v1)

Published 14 Oct 2021 in cs.CV and cs.AI

Abstract: Depth is the haLLMark of deep neural networks. But more depth means more sequential computation and higher latency. This begs the question -- is it possible to build high-performing "non-deep" neural networks? We show that it is. To do so, we use parallel subnetworks instead of stacking one layer after another. This helps effectively reduce depth while maintaining high performance. By utilizing parallel substructures, we show, for the first time, that a network with a depth of just 12 can achieve top-1 accuracy over 80% on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. We also show that a network with a low-depth (12) backbone can achieve an AP of 48% on MS-COCO. We analyze the scaling rules for our design and show how to increase performance without changing the network's depth. Finally, we provide a proof of concept for how non-deep networks could be used to build low-latency recognition systems. Code is available at https://github.com/imankgoyal/NonDeepNetworks.

Citations (55)

View on Semantic Scholar

Summary

The paper demonstrates that shallow, parallel subnetworks can rival deep architectures by achieving over 80% top-1 accuracy on ImageNet.
It introduces ParNet#1, a 12-layer design that attains 96% on CIFAR10 and 81% on CIFAR100, challenging traditional depth requirements.
It highlights practical benefits by reducing computational costs and latency, paving the way for efficient real-time recognition systems.

Overview of Non-deep Networks

The paper "Non-deep Networks" by Goyal et al. explores a fundamental question in neural network architecture: can high-performing neural networks exist without large depth? This paper challenges the conventional wisdom that depth is a prerequisite for achieving state-of-the-art results. By employing parallel subnetworks, the authors propose an architecture named ParNet#1 that operates at significantly reduced depths while maintaining competitive performance across major benchmarks like ImageNet, CIFAR10, and CIFAR100.

Key Contributions

Architectural Design: The authors introduce ParNet#1, which utilizes parallel subnetworks rather than conventional sequential stacking. This structure enables the network to achieve strong performance metrics with a depth of just 12 layers, challenging existing paradigms about necessary depth in neural networks.
Empirical Results: ParNet#1 achieves top-1 accuracies of over 80% on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. These results are significant because they demonstrate that high performance is achievable with shallow architectures, a development that could redefine efficiency standards in neural network design.
Practical Implementation: The paper provides a proof of concept for employing non-deep networks in low-latency recognition systems. This is particularly relevant for applications requiring real-time processing where latency is a critical concern.
Scaling Rules: ParNet#1 demonstrates effective scaling by increasing width, resolution, and the number of branches while keeping depth constant. The results suggest potential for further performance gains by merely augmenting computational resources, without increasing depth.

Strong Numerical Results

ImageNet Performance: ParNet#1 achieves a notable top-1 accuracy of 80.7% with just 12 layers, comparable to deeper architectures with significantly more layers.
CIFAR Performance: On CIFAR10 and CIFAR100, ParNet#1 scores 96% and 81%, respectively, aligning closely with deeper networks traditionally deemed necessary for such results.
MS-COCO Detection: With a 12-layer backbone, the network achieves an Average Precision (AP) of 48% on MS-COCO, underscoring its utility in complex tasks beyond mere classification.

Theoretical Implications

The paper challenges the prevailing notion that large depth is essential for neural networks to effectively learn complex representations. By demonstrating that shallow networks with parallel structures can achieve comparable performance, this paper opens avenues for exploring new network designs focused on to optimize width and parallelism without increasing depth.

Practical Implications

The deployment of ParNet#1-like architectures could reduce the computational footprint without sacrificing accuracy, which is especially beneficial in environments where resources are constrained. The potential for faster inference due to reduced sequential computation is significant, particularly for tasks requiring rapid processing and response.

Speculation on Future Developments

The architecture proposed in this paper may inspire future research into specialized hardware that capitalizes on parallel subnetworks, overcoming current limitations in communication latency between computing units. As such, developments in both software and hardware could lead to a new class of ultra-fast, efficient recognition systems.

Conclusion

This paper advances the understanding of neural network design by proving that efficient, high-performing models are feasible outside the paradigm of depth-heavy architectures. ParNet#1 exemplifies how strategic use of parallel structures can redefine network efficiency, offering new directions for both theoretical exploration and practical application in AI systems.

PDF Markdown

Related Papers

GitHub

GitHub - imankgoyal/NonDeepNetworks: Official Code for "Non-deep Networks" (586 stars)