Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data
Abstract: Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. Yet, its performance is known to degrade under distribution shift and long-tailed class distributions, which are often present in real world applications. Here, we characterize the performance of several post-hoc and training-based conformal prediction methods under these settings, providing the first empirical evaluation on large-scale datasets and models. We show that across numerous conformal methods and neural network families, performance greatly degrades under distribution shifts violating safety guarantees. Similarly, we show that in long-tailed settings the guarantees are frequently violated on many classes. Understanding the limitations of these methods is necessary for deployment in real world and safety-critical applications.
- Concrete problems in ai safety, 2016.
- Adaptive Conformal Prediction by Reweighting Nonconformity Score, March 2023. URL http://arxiv.org/abs/2303.12695. arXiv:2303.12695 [cs, stat].
- Uncertainty Sets for Image Classifiers using Conformal Prediction, September 2022a. URL http://arxiv.org/abs/2009.14193. arXiv:2009.14193 [cs, math, stat].
- Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control, September 2022b. URL http://arxiv.org/abs/2110.01052. arXiv:2110.01052 [cs, stat].
- Conformal prediction beyond exchangeability, February 2023. URL http://arxiv.org/abs/2202.13415. arXiv:2202.13415 [stat].
- Improved Online Conformal Prediction via Strongly Adaptive Online Learning, February 2023. URL http://arxiv.org/abs/2302.07869. arXiv:2302.07869 [cs, math, stat].
- Fast differentiable sorting and ranking, 2020.
- Language models are few-shot learners. CoRR, abs/2005.14165, 2020. URL https://arxiv.org/abs/2005.14165.
- Causality matters in medical imaging. Nature Communications, 11(1), jul 2020. doi: 10.1038/s41467-020-17478-w. URL https://doi.org/10.10382Fs41467-020-17478-w.
- Knowing what You Know: valid and validated confidence sets in multiclass and multilabel prediction.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Group conditional validity via multi-group learning, March 2023. URL http://arxiv.org/abs/2303.03995. arXiv:2303.03995 [cs, math, stat] version: 1.
- An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020a.
- An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020b. URL https://arxiv.org/abs/2010.11929.
- Distribution-Free Prediction Sets for Two-Layer Hierarchical Models, February 2022. URL http://arxiv.org/abs/1809.07441. arXiv:1809.07441 [math, stat].
- Few-shot Conformal Prediction with Auxiliary Tasks, July 2021. URL http://arxiv.org/abs/2102.08898. arXiv:2102.08898 [cs].
- Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution. In NeurIPS Datasets and Benchmarks 2021, 2021.
- ADVERSARIALLY ROBUST CONFORMAL PREDICTION. 2022.
- Conformal Inference for Online Prediction with Arbitrary Distribution Shifts, October 2022. URL http://arxiv.org/abs/2208.08401. arXiv:2208.08401 [cs, stat].
- Conformal prediction with conditional guarantees, 2023.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015.
- Benchmarking neural network robustness to common corruptions and perturbations. ArXiv, abs/1903.12261, 2018.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. ICCV, 2021a.
- Natural adversarial examples. CVPR, 2021b.
- iNaturalist 2018 competition dataset. iNaturalist 2018 competition dataset. https://github.com/visipedia/inat_comp/tree/master/2018, 2018.
- iNaturalist 2019 competition dataset. iNaturalist 2019 competition dataset. https://github.com/visipedia/inat_comp/tree/master/2019, 2019.
- Batch Multivalid Conformal Prediction, September 2022. URL http://arxiv.org/abs/2209.15145. arXiv:2209.15145 [cs, math, stat].
- Krawczyk, B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5:221 – 232, 2016.
- A whac-a-mole dilemma: Shortcuts come in multiples where mitigating one amplifies others. June 2023. URL https://arxiv.org/abs/2212.04825.
- Improving trustworthiness of ai disease severity rating in medical imaging with ordinal conformal prediction sets. In Wang, L., Dou, Q., Fletcher, P. T., Speidel, S., and Li, S. (eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, pp. 545–554, Cham, 2022. Springer Nature Switzerland. ISBN 978-3-031-16452-1.
- Multi-agent reachability calibration with conformal prediction, 2023.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift, 2019.
- Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, 2019.
- Imagenet-21k pretraining for the masses, 2021.
- Classification with Valid and Adaptive Coverage, June 2020. URL http://arxiv.org/abs/2006.02544. arXiv:2006.02544 [stat].
- Least Ambiguous Set-Valued Classifiers with Bounded Error Levels. Journal of the American Statistical Association, 114(525):223–234, January 2019. ISSN 0162-1459, 1537-274X. doi: 10.1080/01621459.2017.1395341. URL http://arxiv.org/abs/1609.00451. arXiv:1609.00451 [cs, stat].
- Mastering the game of go without human knowledge. Nature, 550:354–, October 2017. URL http://dx.doi.org/10.1038/nature24270.
- Learning Optimal Conformal Classifiers, May 2022. URL http://arxiv.org/abs/2110.09192. arXiv:2110.09192 [cs, stat].
- Predictive inference with feature conformal prediction, 2023.
- Conformal prediction under covariate shift, 2020.
- Deit iii: Revenge of the vit. In European Conference on Computer Vision, 2022.
- Conformal prediction in clinical medical sciences. Journal of Healthcare Informatics Research, 6(3):241–252, 2022. doi: 10.1007/s41666-021-00113-8.
- Vovk, V. Conditional validity of inductive conformal predictors, 2012.
- Algorithmic Learning in a Random World. 01 2005. doi: 10.1007/b106715.
- Wightman, R. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.