Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Assessment of the quality of a prediction (2404.15764v5)

Published 24 Apr 2024 in math.ST, stat.ME, and stat.TH

Abstract: Shannon defined the mutual information between two variables. We illustrate why the true mutual information between a variable and the predictions made by a prediction algorithm is not a suitable measure of prediction quality, but the apparent Shannon mutual information (ASI) is; indeed it is the unique prediction quality measure with either of two very different lists of desirable properties, as previously shown by de Finetti and other authors. However, estimating the uncertainty of the ASI is a difficult problem, because of long and non-symmetric heavy tails to the distribution of the individual values of $j(x,y)=\log\frac{Q_y(x)}{P(x)}$ We propose a Bayesian modelling method for the distribution of $j(x,y)$, from the posterior distribution of which the uncertainty in the ASI can be inferred. This method is based on Dirichlet-based mixtures of skew-Student distributions. We illustrate its use on data from a Bayesian model for prediction of the recurrence time of prostate cancer. We believe that this approach is generally appropriate for most problems, where it is infeasible to derive the explicit distribution of the samples of $j(x,y)$, though the precise modelling parameters may need adjustment to suit particular cases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. D. Lindley, “Scoring rules and inevitability of probability,” International Statistical Review, vol. 50, pp. 1–26, 1982.
  2. Cambridge University Press, 2003.
  3. J. Hilden, J. Habbema, and B. Bjerregaard, “The measurement of performance in probabilistic diagnosis: III: Methods based on continuous functions of the diagnostic probabilities,” Method of Information in Medicine, vol. 17, pp. 238–246, 1978.
  4. C. E. Shannon, “A Mathematical Theory of Communication,” Bell Systems Technical Journal, vol. 27, p. 379–423, 1948.
  5. R. Winkler, “Scoring rules and the evaluation of probability assessors,” Journal of the American Statistical Association, vol. 64, pp. 1073–1078, 1969.
  6. A. Shapiro, “The evaluation of clinical predictions,” New England Journal of Medicine, vol. 296, no. 26, pp. 1509–1514, 1977.
  7. David J.C. MacKay. Personal communication, 2010.
  8. F. Harrell, K. Lee, and D. Mark, “Multivariable prognostic models: Issues in developing modeols, evaluating assumptions and adequacy, and measuring and reducing errors,” Statistics in Medicine, vol. 15, pp. 361–387, 1996.
  9. R. F. Sewell, E. J. Crowe, and S. F. Shariat, “Biomarkers can predict time of recurrence of prostate cancer with strictly positive apparent shannon information against an exponential attrition prior,” 2010. Paper to be uploaded to arxiv shortly; v2 of the present paper will contain the definitive reference.
  10. T. Cover and J. Thomas, Elements of Information Theory. Wiley (New York), 1991.
  11. C. E. Shannon and W. Weaver, The mathematical theory of communication. University of Illinois Press, 1949.
  12. E. S. Jr, A. Albert, and H. Massengill, “Admissible probability measurement procedures,” Psychometrika, vol. 31, pp. 125–145, 1966.
  13. B. de Finetti and L. Savage.
  14. C. E. R. et al, “DELVE: Data for evaluating learning in valid experiments.” http://www.ph.tn.tudelft.nl/PRInfo/data/msg00028.html. Downloaded 15th April 2010.
  15. C. E. Rasmussen and R. M. Neal, “The DELVE manual.” Technical Report, University of Toronto, Department of Computer Science, 1996.
  16. W. Gilks and P. Wild, “Adaptive rejection sampling for gibbs sampling,” Applied Statistics, vol. 41, pp. 337–348, 1992.
  17. R. M. Neal, “Probabilistic inference using Markov chain Monte Carlo methods.” Technical report CRG-TR-93-1, Dept of Computer Science, University of Toronto, 1993.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com