Fantastic Generalization Measures and Where to Find Them (1912.02178v1)

Published 4 Dec 2019 in cs.LG and stat.ML

Abstract: Generalization of deep networks has been of great interest in recent years, resulting in a number of theoretically and empirically motivated complexity measures. However, most papers proposing such measures study only a small set of models, leaving open the question of whether the conclusion drawn from those experiments would remain valid in other settings. We present the first large scale study of generalization in deep networks. We investigate more then 40 complexity measures taken from both theoretical bounds and empirical studies. We train over 10,000 convolutional networks by systematically varying commonly used hyperparameters. Hoping to uncover potentially causal relationships between each measure and generalization, we analyze carefully controlled experiments and show surprising failures of some measures as well as promising measures for further research.

Citations (564)

View on Semantic Scholar

Summary

The paper finds that traditional norm-based measures, such as the product of spectral norms, often show negative correlations with generalization performance.
It demonstrates that sharpness-based measures and gradient variance during training are effective predictors of a model's generalization capacity.
The study introduces granulated Kendall’s rank-correlation analysis to uncover nuanced relationships between hyperparameter tuning and generalization gaps.

Analysis of Generalization Measures in Deep Networks

The paper presents an extensive paper of complexity measures aimed at understanding generalization in deep neural networks. By evaluating over 40 measures across more than 10,000 convolutional networks, the authors highlight the efficacy, limitations, and potential of these complexity measures. This large-scale approach provides comprehensive insights into how certain complexity measures can or cannot predict the generalization performance of deep networks effectively.

Key Findings

Spurious Correlations and Norm-Based Measures: The paper identifies that many norm-based measures, which have been traditionally used to predict generalization, often exhibit poor performance. Notably, some norm-based measures such as the product of spectral norms exhibit strong negative correlations with generalization. This suggests a complex relationship between model complexity as captured by norms and its generalization behavior.
Sharpness-Based Measures: Measures based on the sharpness of minima, such as PAC-Bayesian bounds and variants focusing on parameter perturbation, show promising predictive capabilities. The paper underscores the value of sharpness as a critical factor in differentiating models that generalize well.
Effect of Training Dynamics: The variance of gradients during training emerges as a significant predictor of generalization, more so than some traditional complexity measures. This observation supports the notion that optimization dynamics play a crucial role in understanding generalization.
Magnitude-Aware Perturbation Bounds: Introducing parameter magnitude into the sharpness measures strengthens their relationship with generalization, further demonstrating their potential as a robust metric.

Methodological Approach

The paper introduces enhanced methods for evaluation, focusing on Kendall’s rank-correlation coefficient and conditional independence tests. These methods aim to unravel the causal relationships between hyperparameter tuning, complexity measures, and generalization gap. The introduction of granulated Kendall's coefficient provides a more nuanced understanding by accounting for variations across different hyperparameter dimensions.

Technical Implications

The results suggest a reevaluation of commonly used complexity measures in assessing model generalization. Specifically:

Norm-Based Measures: Their negative correlation necessitates a reconsideration of their applicability in network design and regularization.
Optimization and Training Dynamics: Encourages further exploration into how training process metrics can serve as complementary indicators of model performance.
Sharpness and Stability: Reaffirms the potential of sharpness-based measures, advocating for more research into theoretical justifications and empirical validations of these approaches.

Future Directions

The findings open several avenues for future exploration:

Broader Dataset Usage: Extending the analysis to diverse datasets and more complex models could further validate the findings and increase their applicability.
Integration with Theoretical Frameworks: Developing tighter theoretical bounds that capture the complexity of deep networks more accurately, particularly those that correctly correlate with empirical findings.
Optimization Process Insights: Investigating how other optimization-related metrics might contribute to or enhance current understanding.

Conclusion

This publication contributes significant empirical evidence to the ongoing discourse on generalization in deep learning. By systematically dismantling and evaluating a wide range of complexity measures, it offers a refined lens through which researchers can assess and compare deep models' generalization capacities. This work not only questions long-held assumptions about norm-based generalization measures but also opens new vistas for leveraging optimization dynamics as predictive tools. Through their exhaustive experimental design, the authors provide a foundation upon which future theoretical and empirical investigations can be built.

PDF Markdown