Minimum-Norm Interpolation Under Covariate Shift (2404.00522v2)
Abstract: Transfer learning is a critical part of real-world machine learning deployments and has been extensively studied in experimental works with overparameterized neural networks. However, even in the simplest setting of linear regression a notable gap still exists in the theoretical understanding of transfer learning. In-distribution research on high-dimensional linear regression has led to the identification of a phenomenon known as \textit{benign overfitting}, in which linear interpolators overfit to noisy training labels and yet still generalize well. This behavior occurs under specific conditions on the source covariance matrix and input data dimension. Therefore, it is natural to wonder how such high-dimensional linear models behave under transfer learning. We prove the first non-asymptotic excess risk bounds for benignly-overfit linear interpolators in the transfer learning setting. From our analysis, we propose a taxonomy of \textit{beneficial} and \textit{malignant} covariate shifts based on the degree of overparameterization. We follow our analysis with empirical studies that show these beneficial and malignant covariate shifts for linear interpolators on real image data, and for fully-connected neural networks in settings where the input data dimension is larger than the training sample size.
- “Benign Overfitting in Linear Regression” In Proceedings of the National Academy of Sciences 117.48, 2020, pp. 30063–30070
- “Generalization in Kernel Regression Under Realistic Assumptions” In Preprint, arXiv:2312.15995, 2023
- “ACCESS: Advancing Innovation: NSF’s Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support” In In Practice and Experience in Advanced Research Computing (PEARC ’23), 2023 DOI: 10.1145/3569951.3597559
- Niladri S. Chatterji and Philip M. Long “Deep Linear Networks can Benignly Overfit when Shallow Ones Do” In Journal of Machine Learning Research 24.117, 2023, pp. 1–39
- Niladri S. Chatterji, Philip M. Long and Peter L. Bartlett “The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks” In Journal of Machine Learning Research 23.263, 2022, pp. 1–48
- Alexander D’Amour “Underspecification Presents Challenges for Credibility in Modern Machine Learning” In Journal of Machine Learning Research 23.226, 2022, pp. 1–61
- “Revisiting minimum description length complexity in overparameterized models” In Preprint, arXiv:2006.10189, 2020
- “Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift” In Conference on Neural Information Processing Systems (NeurIPS), 2023
- Spencer Frei, Niladri S. Chatterji and Peter L. Bartlett “Random Feature Amplification: Feature Learning and Generalization in Neural Networks” In Journal of Machine Learning Research 24.303, 2023, pp. 1–49
- Spencer Frei, Niladri S Chatterji and Peter Bartlett “Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data” In Conference on Learning Theory (COLT), 2022
- “Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization” In Conference on Learning Theory (COLT), 2023
- “Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data” In International Conference on Learning Representations (ICLR), 2022
- “Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension” In Conference on Neural Information Processing Systems (NeurIPS), 2023
- “Surprises in High-Dimensional Ridgeless Least Squares Interpolation” In Annals of Statistics 50.2, 2022, pp. 949–986
- “The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization” In International Conference on Computer Vision (ICCV), 2021
- Dan Hendrycks and Thomas G. Dietterich “Benchmarking neural network robustness to common corruptions and perturbations” In International Conference on Learning Representations (ICLR), 2019
- “Wilds: A benchmark of in-the-wild distribution shifts” In International Conference on Machine Learning (ICML), 2021
- Guy Kornowski, Gilad Yehudai and Ohad Shamir “From Tempered to Benign Overfitting in ReLU Neural Networks” In Conference on Neural Information Processing Systems (NeurIPS), 2023
- “Benign Overfitting in Two-layer ReLU Convolutional Neural Networks” In International Conference on Machine Learning (ICML), 2023
- “Generalization Ability of Wide Residual Networks” In Preprint, arXiv:2305.18506, 2023
- Qi Lei, Wei Hu and Jason D. Lee “Near-Optimal Linear Regression under Distribution Shift” In International Conference on Machine Learning (ICML), 2021
- “Accuracy on the Curve: On the Nonlinear Correlation of ML Performance Between Data Subpopulations” In International Conference on Machine Learning (ICML), 2023
- Cong Ma, Reese Pathak and Martin J. Wainwright “Optimally tackling covariate shift in RKHS-based nonparametric regression” In Annals of Statistics 51.2, 2023, pp. 738–761
- “Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting” In Conference on Neural Information Processing Systems (NeurIPS), 2022
- “Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization” In International Conference on Machine Learning (ICML), 2021
- “Harmless interpolation of noisy data in regression” In IEEE Journal on Selected Areas in Information Theory 1.1, 2020, pp. 67–83
- “Towards Robust Waveform-Based Acoustic Models” In IEEE/ACM Transactions on Audio, Speech, and Language Processing 30, 2022, pp. 1977–1992
- Reese Pathak, Cong Ma and Martin Wainwright “A new similarity measure for covariate shift with applications to nonparametric regression” In International Conference on Machine Learning (ICML), 2022
- “Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon” In Conference on Learning Theory (COLT), 2019
- “Do ImageNet Classifiers Generalize to ImageNet?” In International Conference on Machine Learning (ICML), 2019
- “Statistical Learning under Heterogeneous Distribution Shift” In International Conference on Machine Learning (ICML), 2023
- Nilesh Tripuraneni, Ben Adlam and Jeffrey Pennington “Overparameterization Improves Robustness to Covariate Shift in High Dimensions” In Conference on Neural Information Processing Systems (NeurIPS), 2021
- Alexander Tsigler and Peter L. Bartlet “Benign overfitting in ridge regression” In Journal of Machine Learning Research 24.123, 2023, pp. 1–76
- Roman Vershynin “High-dimensional probability” Cambridge University Press, 2018
- Kaizheng Wang “Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift” In Preprint, arXiv:2302.10160, 2023
- “Is Importance Weighting Incompatible with Interpolating Classifiers?” In International Conference on Learning Representations (ICLR), 2022
- “Assaying Out-Of-Distribution Generalization in Transfer Learning” In Conference on Neural Information Processing Systems (NeurIPS), 2022
- “Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data” In International Conference on Learning Representations (ICLR), 2024
- “Understanding deep learning requires rethinking generalization” In International Conference on Learning Representations (ICLR), 2017