Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization (2107.04649v2)

Published 9 Jul 2021 in cs.LG and stat.ML

Abstract: For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, satellite imagery classification in FMoW-WILDS, and wildlife classification in iWildCam-WILDS. The strong correlations hold across model architectures, hyperparameters, training set size, and training duration, and are more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.

Citations (245)

View on Semantic Scholar

Summary

The paper demonstrates a significant linear correlation between in-distribution and out-of-distribution performance across diverse models and datasets.
It outlines how pretraining and targeted data augmentation enhance both ID and OOD accuracies, supporting reliable model evaluation.
The study introduces a Gaussian-based theoretical model to explain observed trends and their implications for robust transfer learning.

Overview of "Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization"

The paper investigates the correlation between in-distribution (ID) and out-of-distribution (OOD) performance of machine learning models. This correlation is critical for the reliability of models when deployed in environments different from those they were trained on. The research provides empirical evidence supporting strong linear correlations across various datasets and model architectures, including CIFAR-10, ImageNet, and several real-world datasets from the WILDS benchmark.

Key Findings

Strong Linear Correlations: A significant observation is the linear relationship between ID and OOD accuracies across different datasets and model families. This correlation holds even when considering variations in architectures, hyperparameters, training set sizes, and training durations.
Predictability of Linear Trends: A linear trend suggests predictability of OOD performance based on ID accuracy, challenging the notion that OOD generalization inherently leads to unpredictable performance.
Pretraining and Data Augmentation: Pretraining models on larger, more diverse datasets sometimes maintains the linear trend, enhancing both ID and OOD accuracy. Similarly, targeted data augmentation strategies can improve OOD performance consistency.
Settings with Weaker Correlations: Notably, weaker correlations were observed in settings such as the Camelyon17-WILDS, some synthetic shifts in CIFAR-10-C, and a particular variant of iWildCam-WILDS.
Theoretical Insights: The paper introduces a theoretical model based on Gaussian distributions to explain when linearity might be expected. This model highlights the role of the data covariance matrix in influencing performance correlations.

Implications

Model Selection and Evaluation: The findings imply that models exhibiting strong ID accuracy metrics are expected to perform similarly under OOD conditions, offering a straightforward approach to model selection based on in-distribution performance.
Benchmarking Robustness: In scenarios where a linear correlation holds, predicting OOD performance can guide us in developing models that go beyond empirical risk minimization and aim for robust classifier design.
Transfer Learning: For practitioners employing pre-trained models, this research underscores the importance of understanding how pretraining influences ID and OOD trends.
Uniformity in Algorithm Performance: The consistent ID-OOD correlation suggests that improvements in out-of-distribution robustness might require strategies that explicitly target deviations from these trends, rather than focusing solely on in-distribution accuracy.

Future Directions

Enhanced Robustness Through Theory: Further theoretical work could refine the Gaussian data model and expand it to accommodate real-world distribution complexities.
Exploring Non-Linear Correlations: Investigating cases where the correlation deviates from linearity might provide insights into robustness interventions that can be applied uniformly across various data distributions.
Data Diversity Assessments: Research should further explore how the diversity of the training set impacts the correlation between ID and OOD performance, especially considering the transfer learning context.

Overall, the paper contributes significantly to the understanding of OOD generalization in machine learning, setting a foundation for further empirical investigation and theoretical development in the field of AI model robustness.

PDF Markdown

Related Papers

Tweets

https://twitter.com/norabelrose/status/1744035056660992447