Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Calibration Error for Decision Making (2404.13503v5)

Published 21 Apr 2024 in cs.LG, cs.DS, and stat.ML

Abstract: Calibration allows predictions to be reliably interpreted as probabilities by decision makers. We propose a decision-theoretic calibration error, the Calibration Decision Loss (CDL), defined as the maximum improvement in decision payoff obtained by calibrating the predictions, where the maximum is over all payoff-bounded decision tasks. Vanishing CDL guarantees the payoff loss from miscalibration vanishes simultaneously for all downstream decision tasks. We show separations between CDL and existing calibration error metrics, including the most well-studied metric Expected Calibration Error (ECE). Our main technical contribution is a new efficient algorithm for online calibration that achieves near-optimal $O(\frac{\log T}{\sqrt{T}})$ expected CDL, bypassing the $\Omega(T{-0.472})$ lower bound for ECE by Qiao and Valiant (2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Anagnostides, Ioannis, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, and Tuomas Sandholm (2022) “Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games,” in Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, 736–749.
  2. Arora, Sanjeev, Elad Hazan, and Satyen Kale (2012) “The Multiplicative Weights Update Method: a Meta-Algorithm and Applications,” Theory of Computing, 8 (6), 121–164, 10.4086/toc.2012.v008a006.
  3. Arunachaleswaran, Eshwar Ram, Natalie Collina, Aaron Roth, and Mirah Shi (2024) “An Elementary Predictor Obtaining 2⁢T2𝑇2\sqrt{T}2 square-root start_ARG italic_T end_ARG Distance to Calibration,” arXiv preprint arXiv:2402.11410.
  4. Blackwell, David (1951) “Comparison of experiments,” in Proceedings of the second Berkeley symposium on mathematical statistics and probability, 2, 93–103, University of California Press.
  5. Błasiok, Jarosław, Parikshit Gopalan, Lunjia Hu, Adam Tauman Kalai, and Preetum Nakkiran (2024) “Loss Minimization Yields Multicalibration for Large Neural Networks,” in Guruswami, Venkatesan ed. 15th Innovations in Theoretical Computer Science Conference (ITCS 2024), 287 of Leibniz International Proceedings in Informatics (LIPIcs), 17:1–17:21, Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 10.4230/LIPIcs.ITCS.2024.17.
  6. Błasiok, Jarosław, Parikshit Gopalan, Lunjia Hu, and Preetum Nakkiran (2023a) “A unifying theory of distance from calibration,” in Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 1727–1740.
  7. Błasiok, Jarosław, Parikshit Gopalan, Lunjia Hu, and Preetum Nakkiran (2023b) “When Does Optimizing a Proper Loss Yield Calibration?” in Oh, A., T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine eds. Advances in Neural Information Processing Systems, 36, 72071–72095: Curran Associates, Inc. https://proceedings.neurips.cc/paper˙files/paper/2023/file/e4165c96702bac5f4962b70f3cf2f136-Paper-Conference.pdf.
  8. Blum, Avrim and Yishay Mansour (2007) “From External to Internal Regret,” Journal of Machine Learning Research, 8 (47), 1307–1324, http://jmlr.org/papers/v8/blum07a.html.
  9. Chen, Liyu, Haipeng Luo, and Chen-Yu Wei (2021) “Impossible tuning made possible: A new expert algorithm and its applications,” in Conference on Learning Theory, 1216–1259, PMLR.
  10. Dagan, Yuval, Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich (2023) “From external to swap regret 2.0: An efficient reduction and oblivious adversary for large action spaces,” arXiv preprint arXiv:2310.19786.
  11. Dawid, A. P. (1982) “The Well-Calibrated Bayesian,” Journal of the American Statistical Association, 77 (379), 605–610, 10.1080/01621459.1982.10477856.
  12. Foster, Dean P and Sergiu Hart (2021) “Forecast hedging and calibration,” Journal of Political Economy, 129 (12), 3447–3490.
  13. Foster, Dean P and Rakesh Vohra (1999) “Regret in the on-line decision problem,” Games and Economic Behavior, 29 (1-2), 7–35.
  14. Foster, Dean P and Rakesh V Vohra (1997) “Calibrated learning and correlated equilibrium,” Games and Economic Behavior, 21 (1-2), 40–55.
  15. Foster, Dean P and Rakesh V Vohra (1998) “Asymptotic calibration,” Biometrika, 85 (2), 379–390.
  16. Frongillo, Rafael and Ian Kash (2014) “General truthfulness characterizations via convex analysis,” in Web and Internet Economics: 10th International Conference, WINE 2014, Beijing, China, December 14-17, 2014. Proceedings 10, 354–370, Springer.
  17. Garg, Sumegha, Christopher Jung, Omer Reingold, and Aaron Roth (2024) “Oracle Efficient Online Multicalibration and Omniprediction,” in Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2725–2792, 10.1137/1.9781611977912.98.
  18. Gopalan, Parikshit, Lunjia Hu, Michael P. Kim, Omer Reingold, and Udi Wieder (2023a) “Loss Minimization Through the Lens Of Outcome Indistinguishability,” in Tauman Kalai, Yael ed. 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), 251 of Leibniz International Proceedings in Informatics (LIPIcs), 60:1–60:20, Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 10.4230/LIPIcs.ITCS.2023.60.
  19. Gopalan, Parikshit, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder (2022) “Omnipredictors,” in Braverman, Mark ed. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022), 215 of Leibniz International Proceedings in Informatics (LIPIcs), 79:1–79:21, Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 10.4230/LIPIcs.ITCS.2022.79.
  20. Gopalan, Parikshit, Michael Kim, and Omer Reingold (2023b) “Swap Agnostic Learning, or Characterizing Omniprediction via Multicalibration,” in Oh, A., T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine eds. Advances in Neural Information Processing Systems, 36, 39936–39956: Curran Associates, Inc. https://proceedings.neurips.cc/paper˙files/paper/2023/file/7d693203215325902ff9dbdd067a50ac-Paper-Conference.pdf.
  21. Gopalan, Parikshit, Princewill Okoroafor, Prasad Raghavendra, Abhishek Shetty, and Mihir Singhal (2024) “Omnipredictors for Regression and the Approximate Rank of Convex Functions,” arXiv preprint arXiv:2401.14645.
  22. Hart, Sergiu (2022) “Calibrated Forecasts: The Minimax Proof,” arXiv preprint arXiv:2209.05863.
  23. Hart, Sergiu and Andreu Mas-Colell (2000) “A simple adaptive procedure leading to correlated equilibrium,” Econometrica, 68 (5), 1127–1150.
  24. Hart, Sergiu and Andreu Mas-Colell (2001) “A reinforcement procedure leading to correlated equilibrium,” in Economics Essays: A Festschrift for Werner Hildenbrand, 181–200: Springer.
  25. Hartline, Jason D r⃝ Liren Shan r⃝ Yingkai Li r⃝ Yifan Wu (2023) “Optimal scoring rules for multi-dimensional effort,” in The Thirty Sixth Annual Conference on Learning Theory, 2624–2650, PMLR.
  26. Hebert-Johnson, Ursula, Michael Kim, Omer Reingold, and Guy Rothblum (2018) “Multicalibration: Calibration for the (Computationally-Identifiable) Masses,” in Dy, Jennifer and Andreas Krause eds. Proceedings of the 35th International Conference on Machine Learning, 80 of Proceedings of Machine Learning Research, 1939–1948: PMLR, 10–15 Jul, https://proceedings.mlr.press/v80/hebert-johnson18a.html.
  27. Hu, Lunjia, Inbal Rachel Livni Navon, Omer Reingold, and Chutong Yang (2023) “Omnipredictors for Constrained Optimization,” in Krause, Andreas, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett eds. Proceedings of the 40th International Conference on Machine Learning, 202 of Proceedings of Machine Learning Research, 13497–13527: PMLR, 23–29 Jul, https://proceedings.mlr.press/v202/hu23b.html.
  28. Kakade, Sham M. and Dean P. Foster (2008) “Deterministic calibration and Nash equilibrium,” Journal of Computer and System Sciences, 74 (1), 115–130, https://doi.org/10.1016/j.jcss.2007.04.017, Learning Theory 2004.
  29. Kim, Michael P., Amirata Ghorbani, and James Zou (2019) “Multiaccuracy: Black-Box Post-Processing for Fairness in Classification,” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’19, 247–254, New York, NY, USA: Association for Computing Machinery, 10.1145/3306618.3314287.
  30. Kim, Michael P. and Juan C. Perdomo (2023) “Making Decisions Under Outcome Performativity,” in Tauman Kalai, Yael ed. 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), 251 of Leibniz International Proceedings in Informatics (LIPIcs), 79:1–79:15, Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 10.4230/LIPIcs.ITCS.2023.79.
  31. Kleinberg, Bobby, Renato Paes Leme, Jon Schneider, and Yifeng Teng (2023) “U-calibration: Forecasting for an unknown agent,” in The Thirty Sixth Annual Conference on Learning Theory, 5143–5145, PMLR.
  32. Li, Yingkai r⃝ Jason D Hartline r⃝ Liren Shan r⃝ Yifan Wu (2022) “Optimization of scoring rules,” in Proceedings of the 23rd ACM Conference on Economics and Computation, 988–989.
  33. McCarthy, John (1956) “Measures of the value of information,” Proceedings of the National Academy of Sciences of the United States of America, 42 (9), 654.
  34. Neyman, Eric, Georgy Noarov, and S Matthew Weinberg (2021) “Binary scoring rules that incentivize precision,” in Proceedings of the 22nd ACM Conference on Economics and Computation, 718–733.
  35. Noarov, Georgy, Ramya Ramalingam, Aaron Roth, and Stephan Xie (2023) “High-Dimensional Unbiased Prediction for Sequential Decision Making,” in OPT 2023: Optimization for Machine Learning, https://openreview.net/forum?id=P4j4l45NUq.
  36. Peng, Binghui and Aviad Rubinstein (2023) “Fast swap regret minimization and applications to approximate correlated equilibria,” arXiv preprint arXiv:2310.19647.
  37. Qiao, Mingda and Gregory Valiant (2021) “Stronger calibration lower bounds via sidestepping,” in Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, 456–466, New York, NY, USA: Association for Computing Machinery, 10.1145/3406325.3451050.
  38. Qiao, Mingda and Letian Zheng (2024) “On the Distance from Calibration in Sequential Prediction,” arXiv preprint arXiv:2402.07458.
  39. Roth, Aaron (2022) “Uncertain: Modern topics in uncertainty estimation.”
  40. Roth, Aaron and Mirah Shi (2024) “Forecasting for Swap Regret for All Downstream Agents,” arXiv preprint arXiv:2402.08753.
  41. Savage, Leonard J (1971) “Elicitation of personal probabilities and expectations,” Journal of the American Statistical Association, 66 (336), 783–801.

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com