Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Testing Calibration in Nearly-Linear Time (2402.13187v2)

Published 20 Feb 2024 in cs.LG, cs.DS, stat.CO, and stat.ML

Abstract: In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model calibration have remained relatively less well-explored. Motivated by [BGHN23], which proposed a rigorous framework for measuring distances to calibration, we initiate the algorithmic study of calibration through the lens of property testing. We define the problem of calibration testing from samples where given $n$ draws from a distribution $\mathcal{D}$ on $(predictions, binary outcomes)$, our goal is to distinguish between the case where $\mathcal{D}$ is perfectly calibrated, and the case where $\mathcal{D}$ is $\varepsilon$-far from calibration. We make the simple observation that the empirical smooth calibration linear program can be reformulated as an instance of minimum-cost flow on a highly-structured graph, and design an exact dynamic programming-based solver for it which runs in time $O(n\log2(n))$, and solves the calibration testing problem information-theoretically optimally in the same time. This improves upon state-of-the-art black-box linear program solvers requiring $\Omega(n\omega)$ time, where $\omega > 2$ is the exponent of matrix multiplication. We also develop algorithms for tolerant variants of our testing problem improving upon black-box linear program solvers, and give sample complexity lower bounds for alternative calibration measures to the one considered in this work. Finally, we present experiments showing the testing problem we define faithfully captures standard notions of calibration, and that our algorithms scale efficiently to accommodate large sample sizes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Deep speech 2 : End-to-end speech recognition in english and mandarin. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 173–182. JMLR.org, 2016.
  2. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018.
  3. A unifying theory of distance from calibration. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1727–1740, 2023.
  4. When does optimizing a proper loss yield calibration? arXiv preprint arXiv:2305.18764, 2023.
  5. Smooth ECE: Principled reliability diagrams via kernel smoothing. arXiv preprint arXiv:2309.12236, 2023.
  6. Clément L. Canonne. Topics and techniques in distribution testing: A biased but representative sample. Foundations and Trends® in Communications and Information Theory, 19(6):1032–1198, 2022.
  7. Solving linear programs in the current matrix multiplication time. J. ACM, 68(1):3:1–3:39, 2021.
  8. Relative lipschitzness in extragradient methods and a direct recipe for acceleration. In 12th Innovations in Theoretical Computer Science Conference, ITCS 2021, volume 185 of LIPIcs, pages 62:1–62:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
  9. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
  10. Outcome indistinguishability. In STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, 2021, pages 1095–1108. ACM, 2021.
  11. Kunio Doi. Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Computerized medical imaging and graphics, 31(4–5):198–211, 2007.
  12. Omnipredictors. In 13th Innovations in Theoretical Computer Science Conference, ITCS 2022, volume 215 of LIPIcs, pages 79:1–79:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
  13. Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017.
  14. On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR, 2017.
  15. Multicalibration: Calibration for the (computationally-identifiable) masses. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, volume 80 of Proceedings of Machine Learning Research, pages 1944–1953. PMLR, 2018.
  16. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pages 2261–2269. IEEE Computer Society, 2017.
  17. Nonparametric Goodness-of-Fit Testing under Gaussian Models, volume 169 of Lecture Notes in Statistics. Springer-Verlag, New York, 2003.
  18. A direct O~⁢(1/ε)~𝑂1𝜀\widetilde{O}(1/\varepsilon)over~ start_ARG italic_O end_ARG ( 1 / italic_ε ) iteration parallel algorithm for optimal transport. Advances in Neural Information Processing Systems, 32, 2019.
  19. Revisiting area convexity: Faster box-simplex games and spectrahedral generalizations. arXiv preprint arXiv:2303.15627, 2023.
  20. Deterministic calibration and nash equilibrium. In John Shawe-Taylor and Yoram Singer, editors, Learning Theory, pages 33–48, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.
  21. Deterministic calibration and nash equilibrium. J. Comput. Syst. Sci., 74(1):115–130, 2008.
  22. Verified uncertainty calibration. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, pages 3787–3798, 2019.
  23. Alex Krizhevsky. Learning multiple layers of features from tiny images. https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf, 2009. Accessed: 2024-01-31.
  24. Trainable calibration measures for neural networks from kernel mean embeddings. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2805–2814. PMLR, 10–15 Jul 2018.
  25. L. LeCam. Convergence of Estimates Under Dimensionality Restrictions. The Annals of Statistics, 1(1):38 – 53, 1973.
  26. Path finding methods for linear programming: Solving linear programs in õ(vrank) iterations and faster algorithms for maximum flow. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, pages 424–433. IEEE Computer Society, 2014.
  27. Revisiting the calibration of modern neural networks. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, pages 15682–15694, 2021.
  28. Revisiting the calibration of modern neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 15682–15694. Curran Associates, Inc., 2021.
  29. Allan H. Murphy. The early history of probability forecasts: Some extensions and clarifications. Weather and forecasting, 13(1):5–15, 1998.
  30. Probability forecasting in meteorology. Journal of the American Statistical Association, 79(387):489–500, 1984.
  31. Obtaining well calibrated probabilities using bayesian binning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, pages 2901–2907. AAAI Press, 2015.
  32. Measuring calibration in deep learning. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, pages 38–41. Computer Vision Foundation / IEEE, 2019.
  33. Zipei Nie. Matrix anti-concentration inequalities with applications. In STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 568–581. ACM, 2022.
  34. Solving sparse linear systems faster than matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, pages 504–521. SIAM, 2021.
  35. Dana Ron. Property testing: A learning theory perspective. Found. Trends Mach. Learn., 1(3):307–402, 2008.
  36. Dana Ron. Algorithmic and analysis techniques in property testing. Found. Trends Theor. Comput. Sci., 5(2):73–205, 2009.
  37. Recommender Systems Handbook. Springer New York, 2011.
  38. Uncertainty quantification and deep ensembles. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 20063–20075. Curran Associates, Inc., 2021.
  39. Uncertainty quantification and deep ensembles. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, pages 20063–20075, 2021.
  40. Jonah Sherman. Nearly maximum flows in nearly linear time. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, pages 263–269. IEEE Computer Society, 2013.
  41. Jonah Sherman. Area-convexity, l∞{}_{\mbox{{$\infty$}}}start_FLOATSUBSCRIPT ∞ end_FLOATSUBSCRIPT regularization, and undirected multicommodity flow. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 452–460. ACM, 2017.
  42. Minimum cost flows, mdps, and ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regression in nearly linear time for dense instances. In STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, 2021, pages 859–869. ACM, 2021.
  43. Solving tall dense linear programs in nearly linear time. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 775–788. ACM, 2020.
  44. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, 2018, 2017.
  45. New bounds for matrix multiplication: from alpha to omega. CoRR, abs/2307.07970, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com