Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Minimum-Norm Interpolation Under Covariate Shift (2404.00522v2)

Published 31 Mar 2024 in cs.LG and stat.ML

Abstract: Transfer learning is a critical part of real-world machine learning deployments and has been extensively studied in experimental works with overparameterized neural networks. However, even in the simplest setting of linear regression a notable gap still exists in the theoretical understanding of transfer learning. In-distribution research on high-dimensional linear regression has led to the identification of a phenomenon known as \textit{benign overfitting}, in which linear interpolators overfit to noisy training labels and yet still generalize well. This behavior occurs under specific conditions on the source covariance matrix and input data dimension. Therefore, it is natural to wonder how such high-dimensional linear models behave under transfer learning. We prove the first non-asymptotic excess risk bounds for benignly-overfit linear interpolators in the transfer learning setting. From our analysis, we propose a taxonomy of \textit{beneficial} and \textit{malignant} covariate shifts based on the degree of overparameterization. We follow our analysis with empirical studies that show these beneficial and malignant covariate shifts for linear interpolators on real image data, and for fully-connected neural networks in settings where the input data dimension is larger than the training sample size.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. “Benign Overfitting in Linear Regression” In Proceedings of the National Academy of Sciences 117.48, 2020, pp. 30063–30070
  2. “Generalization in Kernel Regression Under Realistic Assumptions” In Preprint, arXiv:2312.15995, 2023
  3. “ACCESS: Advancing Innovation: NSF’s Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support” In In Practice and Experience in Advanced Research Computing (PEARC ’23), 2023 DOI: 10.1145/3569951.3597559
  4. Niladri S. Chatterji and Philip M. Long “Deep Linear Networks can Benignly Overfit when Shallow Ones Do” In Journal of Machine Learning Research 24.117, 2023, pp. 1–39
  5. Niladri S. Chatterji, Philip M. Long and Peter L. Bartlett “The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks” In Journal of Machine Learning Research 23.263, 2022, pp. 1–48
  6. Alexander D’Amour “Underspecification Presents Challenges for Credibility in Modern Machine Learning” In Journal of Machine Learning Research 23.226, 2022, pp. 1–61
  7. “Revisiting minimum description length complexity in overparameterized models” In Preprint, arXiv:2006.10189, 2020
  8. “Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift” In Conference on Neural Information Processing Systems (NeurIPS), 2023
  9. Spencer Frei, Niladri S. Chatterji and Peter L. Bartlett “Random Feature Amplification: Feature Learning and Generalization in Neural Networks” In Journal of Machine Learning Research 24.303, 2023, pp. 1–49
  10. Spencer Frei, Niladri S Chatterji and Peter Bartlett “Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data” In Conference on Learning Theory (COLT), 2022
  11. “Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization” In Conference on Learning Theory (COLT), 2023
  12. “Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data” In International Conference on Learning Representations (ICLR), 2022
  13. “Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension” In Conference on Neural Information Processing Systems (NeurIPS), 2023
  14. “Surprises in High-Dimensional Ridgeless Least Squares Interpolation” In Annals of Statistics 50.2, 2022, pp. 949–986
  15. “The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization” In International Conference on Computer Vision (ICCV), 2021
  16. Dan Hendrycks and Thomas G. Dietterich “Benchmarking neural network robustness to common corruptions and perturbations” In International Conference on Learning Representations (ICLR), 2019
  17. “Wilds: A benchmark of in-the-wild distribution shifts” In International Conference on Machine Learning (ICML), 2021
  18. Guy Kornowski, Gilad Yehudai and Ohad Shamir “From Tempered to Benign Overfitting in ReLU Neural Networks” In Conference on Neural Information Processing Systems (NeurIPS), 2023
  19. “Benign Overfitting in Two-layer ReLU Convolutional Neural Networks” In International Conference on Machine Learning (ICML), 2023
  20. “Generalization Ability of Wide Residual Networks” In Preprint, arXiv:2305.18506, 2023
  21. Qi Lei, Wei Hu and Jason D. Lee “Near-Optimal Linear Regression under Distribution Shift” In International Conference on Machine Learning (ICML), 2021
  22. “Accuracy on the Curve: On the Nonlinear Correlation of ML Performance Between Data Subpopulations” In International Conference on Machine Learning (ICML), 2023
  23. Cong Ma, Reese Pathak and Martin J. Wainwright “Optimally tackling covariate shift in RKHS-based nonparametric regression” In Annals of Statistics 51.2, 2023, pp. 738–761
  24. “Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting” In Conference on Neural Information Processing Systems (NeurIPS), 2022
  25. “Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization” In International Conference on Machine Learning (ICML), 2021
  26. “Harmless interpolation of noisy data in regression” In IEEE Journal on Selected Areas in Information Theory 1.1, 2020, pp. 67–83
  27. “Towards Robust Waveform-Based Acoustic Models” In IEEE/ACM Transactions on Audio, Speech, and Language Processing 30, 2022, pp. 1977–1992
  28. Reese Pathak, Cong Ma and Martin Wainwright “A new similarity measure for covariate shift with applications to nonparametric regression” In International Conference on Machine Learning (ICML), 2022
  29. “Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon” In Conference on Learning Theory (COLT), 2019
  30. “Do ImageNet Classifiers Generalize to ImageNet?” In International Conference on Machine Learning (ICML), 2019
  31. “Statistical Learning under Heterogeneous Distribution Shift” In International Conference on Machine Learning (ICML), 2023
  32. Nilesh Tripuraneni, Ben Adlam and Jeffrey Pennington “Overparameterization Improves Robustness to Covariate Shift in High Dimensions” In Conference on Neural Information Processing Systems (NeurIPS), 2021
  33. Alexander Tsigler and Peter L. Bartlet “Benign overfitting in ridge regression” In Journal of Machine Learning Research 24.123, 2023, pp. 1–76
  34. Roman Vershynin “High-dimensional probability” Cambridge University Press, 2018
  35. Kaizheng Wang “Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift” In Preprint, arXiv:2302.10160, 2023
  36. “Is Importance Weighting Incompatible with Interpolating Classifiers?” In International Conference on Learning Representations (ICLR), 2022
  37. “Assaying Out-Of-Distribution Generalization in Transfer Learning” In Conference on Neural Information Processing Systems (NeurIPS), 2022
  38. “Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data” In International Conference on Learning Representations (ICLR), 2024
  39. “Understanding deep learning requires rethinking generalization” In International Conference on Learning Representations (ICLR), 2017
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com