Convolutional Neural Networks at Constrained Time Cost (1412.1710v1)

Published 4 Dec 2014 in cs.CV

Abstract: Though recent advanced convolutional neural networks (CNNs) have been improving the image recognition accuracy, the models are getting more complex and time-consuming. For real-world applications in industrial and commercial scenarios, engineers and developers are often faced with the requirement of constrained time budget. In this paper, we investigate the accuracy of CNNs under constrained time cost. Under this constraint, the designs of the network architectures should exhibit as trade-offs among the factors like depth, numbers of filters, filter sizes, etc. With a series of controlled comparisons, we progressively modify a baseline model while preserving its time complexity. This is also helpful for understanding the importance of the factors in network designs. We present an architecture that achieves very competitive accuracy in the ImageNet dataset (11.8% top-5 error, 10-view test), yet is 20% faster than "AlexNet" (16.0% top-5 error, 10-view test).

Authors (2)

Kaiming He (71 papers)
Jian Sun (415 papers)

Citations (1,257)

View on Semantic Scholar

Summary

Convolutional Neural Networks at Constrained Time Cost

Kaiming He and Jian Sun's paper, "Convolutional Neural Networks at Constrained Time Cost," presents an insightful exploration of the efficiency of CNNs in the context of industrial and commercial applications where time constraints are paramount. The authors focus on achieving balanced trade-offs among several architectural factors, including network depth, number of filters, and filter sizes, with the objective of maintaining high accuracy under fixed computational time.

Main Contributions and Findings

The core contribution of the paper is the systematic investigation of CNN architectures under constrained time budgets during both training and testing stages. The paper emphasizes the significance of understanding trade-offs between different architectural elements to optimize both speed and accuracy. Major contributions and observations include:

Depth vs. Filter Sizes: Through controlled experiments, the paper demonstrates that network depth is critical for enhancing accuracy, even when the size of the filters is reduced to preserve computational time. For instance, replacing a larger 5×5 filter with two 3×3 filters or four 2×2 filters, the authors show progressive improvements in accuracy while keeping the network's depth manageable.
Depth vs. Width: The paper explores how increasing the depth while reducing the number of filters per layer (width) affects performance. It finds that increased depth typically offers substantial gains in accuracy despite the reduced width. For example, replacing three 3×3 layers with six or nine 3×3 layers while properly adjusting the number of filters yielded lower top-5 errors.
Optimal Depth: The authors caution against indiscriminately increasing network depth. They observe diminishing returns and even degradation in accuracy when the network depth is overly increased without modifying other factors.
Feature Map Size: By introducing additional pooling layers and adjusting subsequent layer structures, the authors managed to further economize on computational resources while enhancing accuracy. For instance, adding a pooling layer after stage 3 and increasing the number of filters in the subsequent stage proved beneficial.
Delayed Subsampling: Adjusting the roles of max pooling layers by separating their filtering and subsampling functions led to consistent accuracy improvements across several models.

Numerical Results

The paper presents a model achieving a top-5 error of 11.8% and a top-1 error of 31.8% on the ImageNet dataset, with 40% less complexity and 20% faster actual GPU speed compared to the widely known AlexNet. This model, referred to as J', demonstrates that sophisticated architectural adjustments can substantially reduce computational burden without sacrificing accuracy.

Implications and Future Directions

The findings have broad implications for both theoretical research and practical applications. The practical applicability of these efficient CNN models is particularly pertinent for real-time systems, such as online search engines and mobile devices, where computational power is limited. The paper also provides a structured approach for future architectural designs under similar constraints.

Future research could extend these methods to optimizing CNNs under constrained memory budgets, which remains a critical issue for both training and deployment on devices with limited memory capacities. Combining time and memory efficiency could lead to even more robust and versatile CNN architectures suitable for a wider range of practical applications.

In conclusion, He and Sun's meticulous exploration of CNN architectures under constrained time costs offers vital insights and practical solutions for developing efficient neural networks without compromising accuracy. This work stands as a significant contribution to the field of deep learning, providing a valuable framework and methodology for future studies and applications in computationally constrained environments.

PDF Markdown

Related Papers

Find Related Papers