Scoring Rules and Calibration for Imprecise Probabilities
Abstract: What does it mean to say that, for example, the probability for rain tomorrow is between 20% and 30%? The theory for the evaluation of precise probabilistic forecasts is well-developed and is grounded in the key concepts of proper scoring rules and calibration. For the case of imprecise probabilistic forecasts (sets of probabilities), such theory is still lacking. In this work, we therefore generalize proper scoring rules and calibration to the imprecise case. We develop these concepts as relative to data models and decision problems. As a consequence, the imprecision is embedded in a clear context. We establish a close link to the paradigm of (group) distributional robustness and in doing so provide new insights for it. We argue that proper scoring rules and calibration serve two distinct goals, which are aligned in the precise case, but intriguingly are not necessarily aligned in the imprecise case. The concept of decision-theoretic entropy plays a key role for both goals. Finally, we demonstrate the theoretical insights in machine learning practice, in particular we illustrate subtle pitfalls relating to the choice of loss function in distributional robustness.
- Maximum of entropy for credal sets. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 11(05):587–597, 2003.
- Upper entropy of credal sets. Applications to credal classification. International Journal of Approximate Reasoning, 39(2-3):235–255, 2005.
- Carlo Acerbi. Spectral measures of risk: A coherent representation of subjective risk aversion. Journal of Banking & Finance, 26(7):1505–1518, 2002.
- Maurice Allais. Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’école Américaine. Econometrica, 21(4):503–546, 1953.
- Kenneth J. Arrow. Uncertainty and the welfare economics of medical care. The American Economic Review, 53(5):941–973, 1963.
- Coherent measures of risk. Mathematical Finance, 9(3):203–228, 1999.
- Introduction to imprecise probabilities. John Wiley & Sons, 2014.
- Fabian Beigang. On the advantages of distinguishing between predictive and allocative fairness in algorithmic decision-making. Minds and Machines, 32(4):655–682, 2022.
- Jochen Bröcker. Reliability, sufficiency, and the decomposition of proper scores. Quarterly Journal of the Royal Meteorological Society, 135(643):1512–1519, 2009.
- Lara Buchak. Risk and rationality. Oxford University Press, 2013.
- Loss functions for binary class probability estimation and classification: Structure and applications. preprint, 2005. URL http://www-stat.wharton.upenn.edu/~buja/PAPERS/paper-proper-scoring.pdf. Accessed: 2024-10-14.
- Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (ACM FAT*), volume 81, pp. 77–91. PMLR, 2018.
- Imprecise Bayesian neural networks. arXiv preprint arXiv:2302.09656, 2023.
- Computable randomness is inherently imprecise. In International Symposium on Imprecise Probability: Theories and Applications, volume 62, pp. 133–144. PMLR, 2017.
- Learning reliable classifiers from small or incomplete data sets: The naive credal classifier 2. Journal of Machine Learning Research, 9(20):581–621, 2008.
- Theory and applications of proper scoring rules. Metron, 72(2):169–183, 2014.
- Coherent dispersion criteria for optimal experimental design. Annals of Statistics, 27(1):65–81, 1999.
- Randomness and imprecision: a discussion of recent results. In International Symposium on Imprecise Probability: Theories and Applications, pp. 110–121. PMLR, 2021.
- Randomness is inherently imprecise. International Journal of Approximate Reasoning, 141:28–68, 2022.
- Bruno de Finetti. Sul significato soggettivo della probabilità . Fundamenta Mathematicae, 17:298–329, 1931.
- The comparison and evaluation of forecasters. Journal of the Royal Statistical Society: Series D (The Statistician), 32(1):12–22, 1983.
- Four facets of forecast felicity: Calibration, predictiveness, randomness and regret. arXiv preprint arXiv:2401.14483, 2024.
- Retiring adult: New datasets for fair machine learning. In Advances in neural information processing systems, volume 34, pp. 6478–6490. JMLR, 2021.
- Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics, 49(3):1378 – 1406, 2021.
- Daniel Ellsberg. Risk, ambiguity, and the Savage axioms. The Quarterly Journal of Economics, 75(4):643–669, 1961.
- Terrence L. Fine. Lower probability models for uncertainty and nondeterministic processes. Journal of Statistical Planning and Inference, 20(3):389–411, 1988.
- Stochastic Finance. de Gruyter, 2016.
- Ambiguity aversion and epistemic uncertainty. preprint SSRN 3922716, 2021. URL https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3922716. Accessed: 2024-10-14.
- Risk measures and upper probabilities: Coherence and stratification. Journal of Machine Learning Research, 25, 2024a.
- Data models with two manifestations of imprecision. arXiv preprint arXiv:2404.09741, 2024b.
- Insights from insurance for fair machine learning. In The 2024 ACM Conference on Fairness, Accountability, and Transparency, pp. 407–421, 2024c.
- Strictly frequentist imprecise probability. International Journal of Approximate Reasoning, 168, 2024.
- Maxmin expected utility with non-unique prior. Journal of Mathematical Economics, 18(2):141–153, 1989.
- Is it always rational to satisfy Savage’s axioms? Economics & Philosophy, 25(3):285–296, 2009.
- Omnipredictors. arXiv preprint arXiv:2109.05389, 2021.
- Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. The Annals of Statistics, 32(4):1367 – 1433, 2004.
- Fundamentals of convex analysis. Springer Science & Business Media, 2004.
- Quantifying aleatoric and epistemic uncertainty: A credal approach. In ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, 2024.
- On the richness of calibration. In The 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 1124–1138, 2023.
- Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning, volume 80, pp. 2029–2037. PMLR, 2018.
- Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine learning, 110(3):457–506, 2021.
- Tailored scoring rules for probabilities. Decision Analysis, 8(4):256–268, 2011.
- James M. Joyce. A defense of imprecise credences in inference and decision making. Philosophical perspectives, 24:281–323, 2010.
- Pretrained visual uncertainties. arXiv preprint arXiv:2402.16569, 2024.
- Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–63, 1997.
- U-calibration: Forecasting for an unknown agent. In The Thirty Sixth Annual Conference on Learning Theory, volume 195, pp. 5143–5145. PMLR, 2023.
- Jason Konek. Epistemic conservativity and imprecise credence. preprint, 2015. URL https://philpapers.org/rec/KONECA. Accessed: 2024-10-14.
- Jason Konek. IP scoring rules: foundations and applications. In International Symposium on Imprecise Probabilities: Theories and Applications, volume 103, pp. 256–264. PMLR, 2019.
- Jason Konek. Evaluating imprecise forecasts. In International Symposium on Imprecise Probability: Theories and Applications, volume 215, pp. 270–279. PMLR, 2023.
- The person of the category: The pricing of risk and the politics of classification in insurance and credit. Theory and Society, 51(5):685–727, 2022.
- Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations Research & Management Science in the Age of Analytics, pp. 130–166. Informs, 2019.
- From fair predictions to just decisions? conceptualizing algorithmic fairness and distributive justice in the context of data-driven decision-making. Frontiers in Sociology, 7, 2022.
- Isaac Levi. The enterprise of knowledge: An essay on knowledge, credal probability, and chance. MIT press, 1980.
- Neural representation of subjective value under risk and ambiguity. Journal of Neurophysiology, 103(2):1036–1047, 2010.
- Scoring imprecise credences. Philosophy and Phenomenological Research, 93(1):55–78, 2016.
- Communication-efficient learning of deep networks from decentralized data. In International Conference on Artificial Intelligence and Statistics, volume 54, pp. 1273–1282. PMLR, 2017.
- Constant regret, generalized mixability, and mirror descent. arXiv preprint arXiv:1802.06965, 2018.
- Learning sets of probabilities through ensemble methods. In The 17th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2023), pp. 270–283. Springer, 2023.
- Statistical information and discrimination. IEEE Transactions on Information Theory, 39(3):1036–1039, 1993.
- On the (dis)similarities between stationary imprecise and non-stationary precise uncertainty models in algorithmic randomness. International Journal of Approximate Reasoning, 151:272–291, 2022.
- Modeling, Measuring and Managing Risk. World Scientific, 2007.
- John Quiggin. Generalized expected utility theory: The rank-dependent model. Springer Science & Business Media, 2012.
- Distributionally robust optimization: A review. arXiv preprint arXiv:1908.05659, 2019.
- Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019. Published as a conference paper at ICLR 2020.
- Mark J. Schervish. A general method for comparing probability assessors. The Annals of Statistics, 17(4):1856–1879, 1989.
- Miriam Schoenfield. The accuracy and rationality of imprecise credences. Noûs, 51(4):667–685, 2017.
- Forecasting with imprecise probabilities. International Journal of Approximate Reasoning, 53(8):1248–1261, 2012.
- Probability and finance: it’s only a game! John Wiley & Sons, 2001.
- Game-theoretic foundations for probability and finance. John Wiley & Sons, 2019.
- Ensemble-based uncertainty quantification: Bayesian versus credal inference. arXiv preprint arXiv2107.10384, 2021.
- Domain generalisation via imprecise learning. arXiv preprint arXiv:2404.04669, 2024.
- Matthias C.M. Troffaes. Decision making under uncertainty using imprecise probabilities. International Journal of Approximate Reasoning, 45(1):17–29, 2007.
- John von Neumann and Oskar Morgenstern. Theory of games and economic behavior, 2nd rev. Princeton University Press, 1947.
- Peter Walley. Statistical reasoning with imprecise probabilities. Chapman-Hall, 1991.
- Towards a frequentist theory of upper and lower probability. The Annals of Statistics, 10(3):741–761, 1982.
- CreINNs: Credal-set interval neural networks for uncertainty estimation in classification tasks. arXiv preprint arXiv:2401.05043, 2024.
- Shaun Wang. Insurance pricing and increased limits ratemaking by proportional hazards transforms. Insurance: Mathematics and Economics, 17(1):43–54, 1995.
- Shaun S. Wang. A class of distortion operators for pricing financial and insurance risks. Journal of Risk and Insurance, pp. 15–36, 2000.
- Distributionally robust post-hoc classifiers under prior shifts. arXiv preprint arXiv:2309.08825, 2023. Published at ICLR 2023.
- The geometry and calculus of losses. Journal of Machine Learning Research, 24(342):1–72, 2023.
- Information processing equalities and the information–risk bridge. Journal of Machine Learning Research, 25(103):1–53, 2024.
- Robert L. Winkler. Evaluating probabilities: Asymmetric scoring rules. Management Science, 40(11):1395–1405, 1994.
- Aolin Xu. Continuity of generalized entropy and statistical learning. arXiv preprint arXiv:2012.15829, 2020.
- Learning loss for active learning. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 93–102, 2019.
- Right decisions from wrong predictions: A mechanism design alternative to individual calibration. In International Conference on Artificial Intelligence and Statistics, volume 130, pp. 2683–2691. PMLR, 2021.
- Calibrating predictions to decisions: A novel approach to multi-class calibration. Advances in Neural Information Processing Systems, 34:22313–22324, 2021a.
- Calibrating predictions to decisions: A novel approach to multi-class calibration. arXiv preprint arXiv:2107.05719, 2021b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.