Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors (2403.09869v1)
Abstract: Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well under subpopulation shifts. We design a simple group-aware prior that only requires access to a small set of data with group information and demonstrate that training with this prior yields state-of-the-art performance -- even when only retraining the final layer of a previously trained non-robust model. Group aware-priors are conceptually simple, complementary to existing approaches, such as attribute pseudo labeling and data reweighting, and open up promising new avenues for harnessing Bayesian inference to enable robustness to subpopulation shifts.
- Invariant risk minimization, 2020.
- Big data’s disparate impact. California Law Review, 104(3):671–732, 2016. ISSN 00081221.
- Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
- Christopher M. Bishop. Pattern recognition and machine learning (information science and statistics). 2006.
- Jeffrey Dastin. Amazon scraps secret ai recruiting tool that showed bias against women. Reuters, 2018.
- Ai for radiographic covid-19 detection selects shortcuts over signal. Nature Machine Intelligence, 2021.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423.
- Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021.
- Fairness without demographics in repeated loss minimization. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1929–1938. PMLR, 10–15 Jul 2018.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Will Douglas Heaven. Hundreds of ai tools have been built to catch covid. none of them helped., 2021.
- Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning, pages 2029–2037. PMLR, 2018.
- Simple data balancing achieves competitive worst-group-accuracy. In Bernhard Schölkopf, Caroline Uhler, and Kun Zhang, editors, Proceedings of the First Conference on Causal Learning and Reasoning, volume 177 of Proceedings of Machine Learning Research, pages 336–351. PMLR, 11–13 Apr 2022.
- Averaging weights leads to wider optima and better generalization. In 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, pages 876–885. Association For Uncertainty in Artificial Intelligence (AUAI), 2018.
- On feature learning in the presence of spurious correlations. Advances in Neural Information Processing Systems, 35:38516–38532, 2022.
- Last layer re-training is sufficient for robustness to spurious correlations. In The Eleventh International Conference on Learning Representations, 2023.
- Is last layer re-training truly sufficient for robustness to spurious correlations?, 2023.
- Domain generalization with adversarial feature learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5400–5409, 2018. doi: 10.1109/CVPR.2018.00566.
- Just train twice: Improving group robustness without training group information. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 6781–6792. PMLR, 18–24 Jul 2021.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Kevin P. Murphy. Machine learning: a probabilistic perspective. MIT Press, Cambridge, Mass. [u.a.], 2013. ISBN 9780262018029 0262018020.
- Spread spurious attribute: Improving worst-group accuracy with spurious attribute estimation. In International Conference on Learning Representations, 2022.
- Distributionally robust language modeling. arXiv preprint arXiv:1909.02060, 2019.
- Discovering environments with xrm, 2023.
- Simple and Fast Group Robustness by Automatic Feature Reweighting. International Conference on Machine Learning (ICML), 2023.
- Joaquin Quiñonero Candela. Dataset shift in machine learning. MIT Press, 2009.
- Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization, 3:1–85, jul 2022. doi: 10.5802/ojmo.15.
- Function-space regularization in neural networks: A probabilistic perspective. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
- Distributionally robust neural networks. In International Conference on Learning Representations, 2020.
- Pre-train your loss: Easy bayesian transfer learning with informative priors. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- BARACK: Partially supervised group robustness with guarantees. In ICML 2022: Workshop on Spurious Correlations, Invariance and Stability, 2022.
- Vladimir Vapnik. Statistical learning theory. Wiley, 1998. ISBN 978-0-471-03003-4.
- The Caltech-UCSD Birds-200-2011 Dataset. Jul 2011.
- Bayesian deep learning and a probabilistic perspective of generalization. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Change is hard: A closer look at subpopulation shift. In International Conference on Machine Learning, 2023.
- mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
- Coping with label shift via distributionally robust optimisation. arXiv preprint arXiv:2010.12230, 2020.
- Correct-n-contrast: a contrastive approach for improving robustness to spurious correlations. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 26484–26516. PMLR, 17–23 Jul 2022.
- Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, 2018. doi: 10.1109/TPAMI.2017.2723009.