Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 78 tok/s

Gemini 2.5 Pro 42 tok/s Pro

GPT-5 Medium 28 tok/s Pro

GPT-5 High 28 tok/s Pro

GPT-4o 80 tok/s Pro

Kimi K2 127 tok/s Pro

GPT OSS 120B 471 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

Truthfulness of Decision-Theoretic Calibration Measures (2503.02384v1)

Published 4 Mar 2025 in cs.LG

Abstract: Calibration measures quantify how much a forecaster's predictions violates calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best-respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, $\mathsf{StepCE}^{{\textsf{sub}}$,} that is both decision-theoretic and truthful. In particular, on any product distribution, $\mathsf{StepCE}^{{\textsf{sub}}$} is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e^{{-\Omega(T)}$-$\Omega(\sqrt{T})$} truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude $c > 0$, $\mathsf{StepCE}^{{\textsf{sub}}$} is truthful up to an $O(\sqrt{\log(1/c)})$ factor, while prior decision-theoretic measures have an $e^{{-\Omega(T)}$-$\Omega(T^{1/3})$} truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting.