Understanding Deep Architectures using a Recursive Convolutional Network

Published 6 Dec 2013 in cs.LG | (1312.1847v2)

Abstract: A key challenge in designing convolutional network models is sizing them appropriately. Many factors are involved in these decisions, including number of layers, feature maps, kernel sizes, etc. Complicating this further is the fact that each of these influence not only the numbers and dimensions of the activation units, but also the total number of parameters. In this paper we focus on assessing the independent contributions of three of these linked variables: The numbers of layers, feature maps, and parameters. To accomplish this, we employ a recursive convolutional network whose weights are tied between layers; this allows us to vary each of the three factors in a controlled setting. We find that while increasing the numbers of layers and parameters each have clear benefit, the number of feature maps (and hence dimensionality of the representation) appears ancillary, and finds most of its benefit through the introduction of more weights. Our results (i) empirically confirm the notion that adding layers alone increases computational power, within the context of convolutional layers, and (ii) suggest that precise sizing of convolutional feature map dimensions is itself of little concern; more attention should be paid to the number of parameters in these layers instead.

Abstract PDF Upgrade to Chat

Citations (143)

View on Semantic Scholar

Summary

The paper demonstrates that increasing layer depth enhances model performance even when the number of parameters remains constant.
It reveals that expanding feature map dimensions yields minimal gains under fixed parameter conditions.
The recursive CNN with tied weights provides actionable insights for efficient parameter allocation in deep network design.

Understanding Deep Architectures Using a Recursive Convolutional Network

This paper, authored by Eigen, Rolfe, Fergus, and LeCun, addresses a critical question in the design of convolutional network architectures: how to appropriately size models in terms of the number of layers, feature maps, and parameters. The research leverages a recursive convolutional network with shared weights across layers to investigate the individual contributions of these factors on model performance.

Key Findings

The authors present several key findings based on experiments conducted on standard datasets, namely CIFAR-10 and SVHN. They empirically establish that:

Layers vs. Feature Maps: An increase in the number of layers generally results in enhanced computational power and model performance, even without additional parameters. This is observed through an analysis using the tied-weight model, which allows variation in layer depth independent of the number of parameters.
Parameters over Feature Maps: The dimensionality of feature maps plays a surprisingly limited role when the parameter count is fixed. The paper indicates that increasing feature maps, while keeping the number of parameters constant, yields minimal performance gains. This suggests the critical factor is the number of parameters, not the representational dimensionality.
Architectural Implications: Based on these findings, convolutional networks benefit more from parameter allocation across multiple layers rather than from extensive dimensional feature maps. This insight challenges conventional wisdom about the necessity of high-dimensional representations in convolutional layers.

Experimental Methodology

The research employs a recursive convolutional network model where weights are tied between layers. This setup provides a controlled environment to independently manipulate each factor: the number of layers, feature maps, and parameters. By untying weights, the model introduces more parameters, allowing the authors to study parameter effects separately.

Practical and Theoretical Implications

Practitioners enabled with these insights can construct more efficient networks by prioritizing parameters and layer depth over expanding feature map dimensions. Theoretically, the elucidation of these components interdependencies contributes to a deeper understanding of convolutional architectures. The findings encourage a shift in focus towards optimizing the parameter distribution and enhancing layer complexity rather than accruing feature map bulks.

Future Directions

Future exploration might extend these principles to architectures involving more complex components like multiple pooling stages or hybrid networks combining convolutional layers with alternate architectures. Additionally, probing whether these observations hold in varying contexts, such as natural language processing, or within graph-based deep learning models, may yield further understanding of parameter and layer dynamics.

In conclusion, the paper provides significant empirical evidence on the relative importance of layers, feature maps, and parameters within convolutional networks. These insights can streamline the process of model architecture design, influencing both theoretical perspectives and practical approaches to model development.

Markdown