Blessings and Curses of Covariate Shifts: Adversarial Learning Dynamics, Directional Convergence, and Equilibria (2212.02457v3)
Abstract: Covariate distribution shifts and adversarial perturbations present robustness challenges to the conventional statistical learning framework: mild shifts in the test covariate distribution can significantly affect the performance of the statistical model learned based on the training distribution. The model performance typically deteriorates when extrapolation happens: namely, covariates shift to a region where the training distribution is scarce, and naturally, the learned model has little information. For robustness and regularization considerations, adversarial perturbation techniques are proposed as a remedy; however, careful study needs to be carried out about what extrapolation region adversarial covariate shift will focus on, given a learned model. This paper precisely characterizes the extrapolation region, examining both regression and classification in an infinite-dimensional setting. We study the implications of adversarial covariate shifts to subsequent learning of the equilibrium -- the Bayes optimal model -- in a sequential game framework. We exploit the dynamics of the adversarial learning game and reveal the curious effects of the covariate shift to equilibrium learning and experimental design. In particular, we establish two directional convergence results that exhibit distinctive phenomena: (1) a blessing in regression, the adversarial covariate shifts in an exponential rate to an optimal experimental design for rapid subsequent learning; (2) a curse in classification, the adversarial covariate shifts in a subquadratic rate to the hardest experimental design trapping subsequent learning.
- Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005.
- Calibrated surrogate losses for adversarially robust classification. In Conference on Learning Theory, pages 408–451. PMLR, 2020.
- Adversarial examples in multi-layer random relu networks. Advances in Neural Information Processing Systems, 34:9241–9252, 2021.
- Analysis of representations for domain adaptation. Advances in Neural Information Processing Systems, 19, 2006.
- A theory of learning from different domains. Machine Learning, 79:151–175, 2010a.
- Impossibility theorems for domain adaptation. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 129–136. JMLR Workshop and Conference Proceedings, 2010b.
- Learning bounds for domain adaptation. Advances in Neural Information Processing Systems, 20, 2007.
- A single gradient step finds adversarial examples on random two-layers neural networks. Advances in Neural Information Processing Systems, 34:10081–10091, 2021.
- Training gans with optimism. arXiv preprint arXiv:1711.00141, 2017.
- E. Delage and Y. Ye. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research, 58(3):595–612, 2010.
- Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906, 2015.
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
- Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79–103, 1999.
- Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(1):2096–2030, 2016.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Online learning to transport via the minimal selection principle. In Conference on Learning Theory, volume 178 of Proceedings of machine learning research, pages 4085–4109. PMLR, July 2022.
- Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning, pages 2029–2037. PMLR, 2018.
- Adversarial examples are not bugs, they are features. Advances in Neural Information Processing Systems, 32, 2019.
- A. Javanmard and M. Soltanolkotabi. Precise statistical analysis of classification accuracies for adversarial training. The Annals of Statistics, 50(4):2127–2156, Aug. 2022. ISSN 0090-5364, 2168-8966. doi: 10.1214/22-AOS2180.
- Precise tradeoffs in adversarial training for linear regression. In Conference on Learning Theory, pages 2034–2078. PMLR, 2020.
- S. Kpotufe and G. Martinet. Marginal singularity, and the benefits of labels in covariate-shift. In Conference On Learning Theory, pages 1882–1886. PMLR, 2018.
- T. Liang. How Well Generative Adversarial Networks Learn Distributions. Journal of Machine Learning Research, 22(228):1–41, 2021. ISSN 1533-7928.
- T. Liang and J. Stokes. Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In International Conference on Artificial Intelligence and Statistics, pages 907–915. PMLR, 2019.
- T. Liang and P. Sur. A precise high-dimensional asymptotic theory for boosting and minimum-l1-norm interpolated classifiers. The Annals of Statistics, 50(3), June 2022. ISSN 0090-5364. doi: 10.1214/22-AOS2170.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430, 2009.
- M. Mohri and A. Muñoz Medina. New analysis and algorithm for learning with drifting distributions. In Algorithmic Learning Theory, pages 124–138. Springer, 2012.
- A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. In International Conference on Artificial Intelligence and Statistics, pages 1497–1507. PMLR, 2020.
- Dataset shift in machine learning. Mit Press, 2008.
- A. Ross and F. Doshi-Velez. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, 2000.
- Certifying some distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571, 2017.
- C. J. Stone. Optimal rates of convergence for nonparametric estimators. The Annals of Statistics, pages 1348–1360, 1980.
- M. Sugiyama and K. Mueller. Generalization error estimation under covariate shift. In Workshop on Information-Based Induction Sciences, pages 21–26. Citeseer, 2005.
- Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007.
- M. Telgarsky. Margins, shrinkage, and boosting. In International Conference on Machine Learning, pages 307–315. PMLR, 2013.
- V. Vapnik. The nature of statistical learning theory. Springer science & business media, 1999.
- C. Villani. Topics in optimal transportation, volume 58. American Mathematical Society, 2021.
- Adversarially robust estimate and risk analysis in linear regression. In International Conference on Artificial Intelligence and Statistics, pages 514–522. PMLR, 2021.
- Robustness and regularization of support vector machines. Journal of Machine Learning Research, 10(7), 2009.