Conformal inference is (almost) free for neural networks trained with early stopping (2301.11556v2)
Abstract: Early stopping based on hold-out data is a popular regularization technique designed to mitigate overfitting and increase the predictive accuracy of neural networks. Models trained with early stopping often provide relatively accurate predictions, but they generally still lack precise statistical guarantees unless they are further calibrated using independent hold-out data. This paper addresses the above limitation with conformalized early stopping: a novel method that combines early stopping with conformal calibration while efficiently recycling the same hold-out data. This leads to models that are both accurate and able to provide exact predictive inferences without multiple data splits nor overly conservative adjustments. Practical implementations are developed for different learning tasks -- outlier detection, multi-class classification, regression -- and their competitive performance is demonstrated on real data.
- “Mastering the game of Go with deep neural networks and tree search” In Nature 529.7587 Nature Publishing Group, 2016, pp. 484–489
- “On calibration of modern neural networks” In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017, pp. 1321–1330 JMLR. org
- “On mixup training: Improved calibration and predictive uncertainty for deep neural networks” In Adv. Neural. Inf. Process. Syst., 2019, pp. 13888–13899
- “Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift” In Advances in Neural Information Processing Systems 32, 2019
- “Dropout: A Simple Way to Prevent Neural Networks from Overfitting” In J. Mach. Learn. Res. 15, 2014, pp. 1929–1958
- “Batch normalization: Accelerating deep network training by reducing internal covariate shift” In International conference on machine learning, 2015, pp. 448–456 PMLR
- Tim Salimans and Durk P Kingma “Weight normalization: A simple reparameterization to accelerate training of deep neural networks” In Advances in neural information processing systems 29, 2016
- Connor Shorten and Taghi M Khoshgoftaar “A survey on image data augmentation for deep learning” In Journal of Big Data 6.1 SpringerOpen, 2019, pp. 1–48
- Lutz Prechelt “Automatic early stopping using cross validation: quantifying the criteria” In Neural networks 11.4 Elsevier, 1998, pp. 761–767
- Vladimir Vovk, Alex Gammerman and Glenn Shafer “Algorithmic learning in a random world” Springer, 2005
- “Conformal anomaly detection of trajectories with a multi-class hierarchy” In International symposium on statistical learning and data sciences, 2015, pp. 281–290 Springer
- “Mondrian Confidence Machine” On-line Compression Modelling project, On-line Compression Modelling project, 2003
- Yaniv Romano, Matteo Sesia and Emmanuel J. Candès “Classification with Valid and Adaptive Coverage” In Advances in Neural Information Processing Systems 33, 2020
- Gary Marcus “Deep learning: A critical appraisal” In arXiv preprint arXiv:1801.00631, 2018
- Vladimir Vovk “Conditional Validity of Inductive Conformal Predictors” In Proceedings of the Asian Conference on Machine Learning 25, 2012, pp. 475–490
- Matteo Sesia and Emmanuel J Candès “A comparison of some conformal quantile regression methods” In Stat 9.1 Wiley Online Library, 2020
- “With Malice Toward None: Assessing Uncertainty via Equalized Coverage” In Harvard Data Science Review, 2020
- “The limits of distribution-free conditional predictive inference” In Information and Inference 10.2 Oxford University Press, 2021, pp. 455–482
- Craig Saunders, Alexander Gammerman and Volodya Vovk “Transduction with confidence and credibility” In IJCAI, 1999
- Vladimir Vovk, Alexander Gammerman and Craig Saunders “Machine-learning applications of algorithmic randomness” In International Conference on Machine Learning, 1999, pp. 444–453
- Jing Lei, James Robins and Larry Wasserman “Distribution-Free Prediction Sets” In J. Am. Stat. Assoc. 108.501 Taylor & Francis, 2013, pp. 278–287
- “Distribution-free prediction bands for non-parametric regression” In J. R. Stat. Soc. (B) 76.1 Wiley Online Library, 2014, pp. 71–96
- “Distribution-free predictive inference for regression” In J. Am. Stat. Assoc. 113.523 Taylor & Francis, 2018, pp. 1094–1111
- “Predictive inference with the jackknife+” In Ann. Stat. 49.1 Institute of Mathematical Statistics, 2021, pp. 486–507
- “Prediction and outlier detection in classification problems” In J. R. Stat. Soc. (B) 84.2, 2022, pp. 524–546
- Ziyi Liang, Matteo Sesia and Wenguang Sun “Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers” In arXiv preprint arXiv:2208.11111, 2022
- “Testing for outliers with conformal p-values” In Ann. Stat. 51.1 Institute of Mathematical Statistics, 2023, pp. 149–178
- Yotam Hechtlinger, Barnabás Póczos and Larry Wasserman “Cautious Deep Learning” arXiv:1805.09460, 2018 arXiv:1805.09460 [stat.ML]
- “Uncertainty Sets for Image Classifiers using Conformal Prediction” In International Conference on Learning Representations, 2021
- “Distribution-free, risk-controlling prediction sets” In Journal of the ACM (JACM) 68.6 ACM New York, NY, 2021, pp. 1–34
- Yaniv Romano, Evan Patterson and Emmanuel J Candès “Conformalized quantile regression” In Advances in Neural Information Processing Systems, 2019, pp. 3538–3548
- “Conformal prediction under covariate shift” In Advances in neural information processing systems 32, 2019
- Matteo Sesia, Stefano Favaro and Edgar Dobriban “Conformal Frequency Estimation with Sketched Data under Relaxed Exchangeability” In arXiv preprint arXiv:2211.04612, 2022
- “Conformal prediction beyond exchangeability” In arXiv preprint arXiv:2202.13415, 2022
- “Conformal inference for online prediction with arbitrary distribution shifts” In arXiv preprint arXiv:2208.08401, 2022
- “Training conformal predictors” In Conformal and Probabilistic Prediction and Applications, 2020, pp. 55–64 PMLR
- Anthony Bellotti “Optimized conformal classification using gradient descent approximation” In arXiv preprint arXiv:2105.11255, 2021
- “Learning Optimal Conformal Classifiers” In arXiv preprint arXiv:2110.09192, 2021
- “Training Uncertainty-Aware Classifiers with Conformalized Deep Learning” In Adv. Neural Inf. Process. Syst. 35, 2022
- Yachong Yang and Arun Kumar Kuchibhotla “Finite-sample efficient conformal prediction” In arXiv preprint arXiv:2104.13871, 2021
- “Machine learning meets false discovery rate” In arXiv preprint arXiv:2208.06685, 2022
- “Controlling the false discovery rate: a practical and powerful approach to multiple testing” In J. R. Stat. Soc. (B) 57.1 Wiley Online Library, 1995, pp. 289–300
- Byol Kim, Chen Xu and Rina Barber “Predictive inference is free with the jackknife+-after-bootstrap” In Advances in Neural Information Processing Systems 33, 2020, pp. 4138–4149
- “Physicochemical properties of protein tertiary structure data set” Accessed: July, 2019, https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure
- Victor Chernozhukov, Kaspar Wüthrich and Yinchu Zhu “Distributional conformal prediction” In Proceedings of the National Academy of Sciences 118.48 National Acad Sciences, 2021, pp. e2107794118
- Rafael Izbicki, Gilson Shimizu and Rafael Stern “Flexible distribution-free conditional predictive bands using density estimators” In International Conference on Artificial Intelligence and Statistics, 2020, pp. 3068–3077 PMLR
- “Conformal Prediction using Conditional Histograms” In Advances in Neural Information Processing Systems 34, 2021
- Olivier Devillers and Mordecai J Golin “Incremental algorithms for finding the convex hulls of circles and the lower envelopes of parabolas” In Information Processing Letters 56.3 Elsevier, 1995, pp. 157–164
- “An output-sensitive convex hull algorithm for planar objects” In International Journal of Computational Geometry & Applications 8.01 World Scientific, 1998, pp. 39–65
- James W. Taylor “A quantile regression neural network approach to estimating the conditional density of multiperiod returns” In Journal of Forecasting 19.4 Wiley Online Library, 2000, pp. 299–311
- Alex Krizhevsky “Learning Multiple Layers of Features from Tiny Images”, 2009 URL: https://www.cs.toronto.edu/~kriz/cifar.html
- Heysem Kaya Pınar Tüfekci “UCI Machine Learning Repository”, 2012 URL: https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant
- “Bike sharing dataset” Accessed: July, 2019, https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset
- “Concrete compressive strength data set” Accessed: July, 2019, http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength
- Maxime Cauchois, Suyash Gupta and John C Duchi “Knowing what You Know: valid and validated confidence sets in multiclass and multilabel prediction.” In J. Mach. Learn. Res. 22, 2021, pp. 81–1
- “Medical Expenditure Panel Survey, Panel 21” Accessed: January, 2019, https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-192
- “BlogFeedback dataset” Accessed: Mar, 2023, https://github.com/xinbinhuang/feature-selection_blogfeedback
- “Tennessee’s Student Teacher Achievement Ratio (STAR) project” Harvard Dataverse, 2008 DOI: 10.7910/DVN/SIWH9F
- “Communities and crime dataset” Accessed: Mar, 2023, https://github.com/vbordalo/Communities-Crime
- “House prices from King County dataset” Accessed: Mar, 2023, https://www.kaggle.com/datasets/shivachandel/kc-house-data?select=kc_house_data.csv
- “Regression Quantiles” In Econometrica 46.1 [Wiley, Econometric Society], 1978, pp. 33–50
- “Estimating conditional quantiles with the help of the pinball loss” In Bernoulli 17.1 Bernoulli Society for Mathematical StatisticsProbability, 2011, pp. 211–225
- Diederik P. Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In preprint at arXiv:1412.6980, 2014
- Milton Abramowitz and Irene A. Stegun “Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables” Dover, 1964