Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Conformal inference is (almost) free for neural networks trained with early stopping (2301.11556v2)

Published 27 Jan 2023 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: Early stopping based on hold-out data is a popular regularization technique designed to mitigate overfitting and increase the predictive accuracy of neural networks. Models trained with early stopping often provide relatively accurate predictions, but they generally still lack precise statistical guarantees unless they are further calibrated using independent hold-out data. This paper addresses the above limitation with conformalized early stopping: a novel method that combines early stopping with conformal calibration while efficiently recycling the same hold-out data. This leads to models that are both accurate and able to provide exact predictive inferences without multiple data splits nor overly conservative adjustments. Practical implementations are developed for different learning tasks -- outlier detection, multi-class classification, regression -- and their competitive performance is demonstrated on real data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. “Mastering the game of Go with deep neural networks and tree search” In Nature 529.7587 Nature Publishing Group, 2016, pp. 484–489
  2. “On calibration of modern neural networks” In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017, pp. 1321–1330 JMLR. org
  3. “On mixup training: Improved calibration and predictive uncertainty for deep neural networks” In Adv. Neural. Inf. Process. Syst., 2019, pp. 13888–13899
  4. “Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift” In Advances in Neural Information Processing Systems 32, 2019
  5. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting” In J. Mach. Learn. Res. 15, 2014, pp. 1929–1958
  6. “Batch normalization: Accelerating deep network training by reducing internal covariate shift” In International conference on machine learning, 2015, pp. 448–456 PMLR
  7. Tim Salimans and Durk P Kingma “Weight normalization: A simple reparameterization to accelerate training of deep neural networks” In Advances in neural information processing systems 29, 2016
  8. Connor Shorten and Taghi M Khoshgoftaar “A survey on image data augmentation for deep learning” In Journal of Big Data 6.1 SpringerOpen, 2019, pp. 1–48
  9. Lutz Prechelt “Automatic early stopping using cross validation: quantifying the criteria” In Neural networks 11.4 Elsevier, 1998, pp. 761–767
  10. Vladimir Vovk, Alex Gammerman and Glenn Shafer “Algorithmic learning in a random world” Springer, 2005
  11. “Conformal anomaly detection of trajectories with a multi-class hierarchy” In International symposium on statistical learning and data sciences, 2015, pp. 281–290 Springer
  12. “Mondrian Confidence Machine” On-line Compression Modelling project, On-line Compression Modelling project, 2003
  13. Yaniv Romano, Matteo Sesia and Emmanuel J. Candès “Classification with Valid and Adaptive Coverage” In Advances in Neural Information Processing Systems 33, 2020
  14. Gary Marcus “Deep learning: A critical appraisal” In arXiv preprint arXiv:1801.00631, 2018
  15. Vladimir Vovk “Conditional Validity of Inductive Conformal Predictors” In Proceedings of the Asian Conference on Machine Learning 25, 2012, pp. 475–490
  16. Matteo Sesia and Emmanuel J Candès “A comparison of some conformal quantile regression methods” In Stat 9.1 Wiley Online Library, 2020
  17. “With Malice Toward None: Assessing Uncertainty via Equalized Coverage” In Harvard Data Science Review, 2020
  18. “The limits of distribution-free conditional predictive inference” In Information and Inference 10.2 Oxford University Press, 2021, pp. 455–482
  19. Craig Saunders, Alexander Gammerman and Volodya Vovk “Transduction with confidence and credibility” In IJCAI, 1999
  20. Vladimir Vovk, Alexander Gammerman and Craig Saunders “Machine-learning applications of algorithmic randomness” In International Conference on Machine Learning, 1999, pp. 444–453
  21. Jing Lei, James Robins and Larry Wasserman “Distribution-Free Prediction Sets” In J. Am. Stat. Assoc. 108.501 Taylor & Francis, 2013, pp. 278–287
  22. “Distribution-free prediction bands for non-parametric regression” In J. R. Stat. Soc. (B) 76.1 Wiley Online Library, 2014, pp. 71–96
  23. “Distribution-free predictive inference for regression” In J. Am. Stat. Assoc. 113.523 Taylor & Francis, 2018, pp. 1094–1111
  24. “Predictive inference with the jackknife+” In Ann. Stat. 49.1 Institute of Mathematical Statistics, 2021, pp. 486–507
  25. “Prediction and outlier detection in classification problems” In J. R. Stat. Soc. (B) 84.2, 2022, pp. 524–546
  26. Ziyi Liang, Matteo Sesia and Wenguang Sun “Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers” In arXiv preprint arXiv:2208.11111, 2022
  27. “Testing for outliers with conformal p-values” In Ann. Stat. 51.1 Institute of Mathematical Statistics, 2023, pp. 149–178
  28. Yotam Hechtlinger, Barnabás Póczos and Larry Wasserman “Cautious Deep Learning” arXiv:1805.09460, 2018 arXiv:1805.09460 [stat.ML]
  29. “Uncertainty Sets for Image Classifiers using Conformal Prediction” In International Conference on Learning Representations, 2021
  30. “Distribution-free, risk-controlling prediction sets” In Journal of the ACM (JACM) 68.6 ACM New York, NY, 2021, pp. 1–34
  31. Yaniv Romano, Evan Patterson and Emmanuel J Candès “Conformalized quantile regression” In Advances in Neural Information Processing Systems, 2019, pp. 3538–3548
  32. “Conformal prediction under covariate shift” In Advances in neural information processing systems 32, 2019
  33. Matteo Sesia, Stefano Favaro and Edgar Dobriban “Conformal Frequency Estimation with Sketched Data under Relaxed Exchangeability” In arXiv preprint arXiv:2211.04612, 2022
  34. “Conformal prediction beyond exchangeability” In arXiv preprint arXiv:2202.13415, 2022
  35. “Conformal inference for online prediction with arbitrary distribution shifts” In arXiv preprint arXiv:2208.08401, 2022
  36. “Training conformal predictors” In Conformal and Probabilistic Prediction and Applications, 2020, pp. 55–64 PMLR
  37. Anthony Bellotti “Optimized conformal classification using gradient descent approximation” In arXiv preprint arXiv:2105.11255, 2021
  38. “Learning Optimal Conformal Classifiers” In arXiv preprint arXiv:2110.09192, 2021
  39. “Training Uncertainty-Aware Classifiers with Conformalized Deep Learning” In Adv. Neural Inf. Process. Syst. 35, 2022
  40. Yachong Yang and Arun Kumar Kuchibhotla “Finite-sample efficient conformal prediction” In arXiv preprint arXiv:2104.13871, 2021
  41. “Machine learning meets false discovery rate” In arXiv preprint arXiv:2208.06685, 2022
  42. “Controlling the false discovery rate: a practical and powerful approach to multiple testing” In J. R. Stat. Soc. (B) 57.1 Wiley Online Library, 1995, pp. 289–300
  43. Byol Kim, Chen Xu and Rina Barber “Predictive inference is free with the jackknife+-after-bootstrap” In Advances in Neural Information Processing Systems 33, 2020, pp. 4138–4149
  44. “Physicochemical properties of protein tertiary structure data set” Accessed: July, 2019, https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure
  45. Victor Chernozhukov, Kaspar Wüthrich and Yinchu Zhu “Distributional conformal prediction” In Proceedings of the National Academy of Sciences 118.48 National Acad Sciences, 2021, pp. e2107794118
  46. Rafael Izbicki, Gilson Shimizu and Rafael Stern “Flexible distribution-free conditional predictive bands using density estimators” In International Conference on Artificial Intelligence and Statistics, 2020, pp. 3068–3077 PMLR
  47. “Conformal Prediction using Conditional Histograms” In Advances in Neural Information Processing Systems 34, 2021
  48. Olivier Devillers and Mordecai J Golin “Incremental algorithms for finding the convex hulls of circles and the lower envelopes of parabolas” In Information Processing Letters 56.3 Elsevier, 1995, pp. 157–164
  49. “An output-sensitive convex hull algorithm for planar objects” In International Journal of Computational Geometry & Applications 8.01 World Scientific, 1998, pp. 39–65
  50. James W. Taylor “A quantile regression neural network approach to estimating the conditional density of multiperiod returns” In Journal of Forecasting 19.4 Wiley Online Library, 2000, pp. 299–311
  51. Alex Krizhevsky “Learning Multiple Layers of Features from Tiny Images”, 2009 URL: https://www.cs.toronto.edu/~kriz/cifar.html
  52. Heysem Kaya Pınar Tüfekci “UCI Machine Learning Repository”, 2012 URL: https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant
  53. “Bike sharing dataset” Accessed: July, 2019, https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset
  54. “Concrete compressive strength data set” Accessed: July, 2019, http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength
  55. Maxime Cauchois, Suyash Gupta and John C Duchi “Knowing what You Know: valid and validated confidence sets in multiclass and multilabel prediction.” In J. Mach. Learn. Res. 22, 2021, pp. 81–1
  56. “Medical Expenditure Panel Survey, Panel 21” Accessed: January, 2019, https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-192
  57. “BlogFeedback dataset” Accessed: Mar, 2023, https://github.com/xinbinhuang/feature-selection_blogfeedback
  58. “Tennessee’s Student Teacher Achievement Ratio (STAR) project” Harvard Dataverse, 2008 DOI: 10.7910/DVN/SIWH9F
  59. “Communities and crime dataset” Accessed: Mar, 2023, https://github.com/vbordalo/Communities-Crime
  60. “House prices from King County dataset” Accessed: Mar, 2023, https://www.kaggle.com/datasets/shivachandel/kc-house-data?select=kc_house_data.csv
  61. “Regression Quantiles” In Econometrica 46.1 [Wiley, Econometric Society], 1978, pp. 33–50
  62. “Estimating conditional quantiles with the help of the pinball loss” In Bernoulli 17.1 Bernoulli Society for Mathematical StatisticsProbability, 2011, pp. 211–225
  63. Diederik P. Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In preprint at arXiv:1412.6980, 2014
  64. Milton Abramowitz and Irene A. Stegun “Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables” Dover, 1964
Citations (9)

Summary

We haven't generated a summary for this paper yet.