Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited (2003.02139v2)

Published 4 Mar 2020 in cs.LG and stat.ML

Abstract: Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity. Indeed, neural networks often have many more parameters than there are data points, yet still provide good generalization performance. Moreover, when we measure generalization as a function of parameters, we see double descent behaviour, where the test error decreases, increases, and then again decreases. We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data. We relate effective dimensionality to posterior contraction in Bayesian deep learning, model selection, width-depth tradeoffs, double descent, and functional diversity in loss surfaces, leading to a richer understanding of the interplay between parameters and functions in deep models. We also show that effective dimensionality compares favourably to alternative norm- and flatness- based generalization measures.

Citations (59)

View on Semantic Scholar

Summary

The paper presents effective dimensionality computed from Hessian eigenvalues as a superior metric to parameter counting for evaluating model complexity.
The paper demonstrates through experiments on ResNet and CNN architectures that effective dimensionality aligns more closely with generalization performance, especially in double descent and width-depth tradeoffs.
The paper links effective dimensionality with posterior contraction in Bayesian frameworks, offering new insights for model selection and deep network design.

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

The paper "Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited" by Wesley J. Maddox, Gregory Benton, and Andrew Gordon Wilson provides an incisive examination of deep learning models through the concept of effective dimensionality. The authors challenge the traditional reliance on parameter counting as a proxy for model complexity and generalization performance, presenting effective dimensionality as a more informative metric.

Key Findings and Contributions

Inefficacy of Parameter Counting: The paper critiques the adequacy of parameter counting in assessing model complexity, particularly in overparameterized models, where the number of parameters exceeds the number of data points. The authors argue that simply counting parameters ignores the role of model architecture and the functional form in dictating model behavior.
Effective Dimensionality as a Measure: By introducing effective dimensionality, computed from the eigenvalues of the Hessian of the training loss, the authors provide a nuanced understanding of how parameters interact with model functions. This measure is highlighted as being more aligned with generalization behavior, especially in the context of the double descent phenomenon and width-depth tradeoffs.
Bayesian Perspectives on Posterior Contraction: The theoretical underpinning of effective dimensionality is explored through its relationship to posterior contraction in Bayesian settings. The paper demonstrates that an increase in effective dimensionality corresponds to a contraction in the posterior variance of parameters, aligning with increased certainty about model parameters.
Function-Space Homogeneity: The authors prove that in the parameter spaces of linear and generalized linear models, there exist subspaces where parameter perturbations result in minimal changes to function predictions, offering an explanation for flat minima in loss landscapes. This concept is extended to neural networks, where the function-space representation remains largely unchanged despite significant parameter variations in degenerate directions.
Experiments on Deep Models: Through experiments with ResNet architectures and convolutional neural networks, the paper empirically demonstrates how effective dimensionality aligns with generalization performance more accurately than parameter count. Notably, the effective dimensionality tracks the generalization error across regimes, including double descent and varying width-depth combinations.

Implications and Future Directions

The implications of this research are manifold:

Model Selection and Optimization: The insights into effective dimensionality can inform better model selection and optimization practices by highlighting architectures that achieve compression in parameter space while maintaining expressive power.
Understanding Generalization: This work propels the understanding of generalization in deep learning by illustrating how model parameters relate to functional capacity, moving beyond simplistic metrics like parameter count.
Design of Deep Networks: The findings encourage the design of architectures that exploit depth effectively rather than width alone, suggesting that deeper models offer better parameter utilization as indicated by lower effective dimensionality.

Looking forward, this paper inspires potential research in adapting effective dimensionality for real-time applications, exploring its implications in transfer learning, and further empirical validations across diverse architectures. It also sparks inquiry into whether effective dimensionality can guide the design of more robust and interpretable models. As the understanding of loss landscapes matures, incorporating effective dimensionality could facilitate the next generation of optimizers and regularization techniques that inherently balance the bias-variance tradeoff.

PDF Markdown

Related Papers

YouTube

Show All Videos