Evidential Calibration of Confidence Intervals (2206.12290v3)
Abstract: We present a novel and easy-to-use method for calibrating error-rate based confidence intervals to evidence-based support intervals. Support intervals are obtained from inverting Bayes factors based on a parameter estimate and its standard error. A $k$ support interval can be interpreted as "the observed data are at least $k$ times more likely under the included parameter values than under a specified alternative". Support intervals depend on the specification of prior distributions for the parameter under the alternative, and we present several types that allow different forms of external knowledge to be encoded. We also show how prior specification can to some extent be avoided by considering a class of prior distributions and then computing so-called minimum support intervals which, for a given class of priors, have a one-to-one mapping with confidence intervals. We also illustrate how the sample size of a future study can be determined based on the concept of support. Finally, we show how the bound for the type I error rate of Bayes factors leads to a bound for the coverage of support intervals. An application to data from a clinical trial illustrates how support intervals can lead to inferences that are both intuitive and informative.
- Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication. The American Statistician, 73(sup1):262–270. doi:10.1080/00031305.2018.1543137.
- The effective sample size. Econometric Reviews, 33(1-4):197–217. doi:10.1080/07474938.2013.807157.
- Testing precise hypotheses. Statistical Science, 2(3):317–335. doi:10.1214/ss/1177013238.
- Testing a point null hypothesis: The irreconcilability of P𝑃Pitalic_P values and evidence. Journal of the American Statistical Association, 82(397):112. doi:10.2307/2289131.
- Blume, J. D. (2002). Likelihood methods for measuring statistical evidence. Statistics in Medicine, 21(17):2563–2599. doi:10.1002/sim.1216.
- On the Lambert W function. Advances in Computational Mathematics, 5(1):329–359. doi:10.1007/bf02124750.
- Edwards, A. W. F. (1971). Likelihood. Cambridge University Press, London.
- Bayesian statistical inference for psychological research. Psychological Review, 70(3):193–242. doi:10.1037/h0044139.
- Fisher, R. A. (1956). Statistical methods and scientific inference. Oliver & Boyd, Edinburgh.
- On the marginal likelihood and cross-validation. Biometrika, 107(2):489–496. doi:10.1093/biomet/asz077.
- Fraser, D. A. S. (2019). The p-value function and statistical inference. The American Statistician, 73(sup1):135–147. doi:10.1080/00031305.2018.1556735.
- Strictly proper scoring rules, prediction, and estimation. Journal of the Amerian Statistical Association, 102(477):359–377. doi:10.1198/016214506000001437.
- Good, I. J. (1992). The Bayes/non-Bayes compromise: A brief review. Journal of the American Statistical Association, 87(419):597–606. doi:10.1080/01621459.1992.10475256.
- Greenland, S. (2023). Divergence versus decision P-values: A distinction worth making in theory and keeping in practice: Or, how divergence P-values measure evidence even when decision P-values do not. Scandinavian Journal of Statistics, 50(1):54–88. doi:10.1111/sjos.12625.
- Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31(4):337–350. doi:10.1007/s10654-016-0149-3.
- Safe testing. doi:10.48550/ARXIV.1906.07801. Preprint.
- Grünwald, P. (2023). The E-posterior. Philosophical Transactions of the Royal Society A, 381(2247). doi:10.1098/rsta.2022.0146.
- Hacking, I. (1965). Logic of Statistical Inference. Cambridge University Press, New York.
- On p-values and Bayes factors. Annual Review of Statistics and Its Application, 5(1):393–419. doi:10.1146/annurev-statistics-031017-100307.
- Optional stopping with Bayes factors: A categorization and extension of folklore results, with an application to invariant situations. Bayesian Analysis, 16(3):961–989. doi:10.1214/20-ba1234.
- Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review volume, 21(5):1157–1164. doi:10.3758/s13423-013-0572-3.
- Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics, 49(2):1055–1080. doi:10.1214/20-aos1991.
- Jeffreys, H. (1961). Theory of Probability. Oxford: Clarendon Press, third edition.
- On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(2):143–170. doi:10.1111/j.1467-9868.2009.00730.x.
- Bayes factors. Journal of the American Statistical Association, 90(430):773–795. doi:10.1080/01621459.1995.10476572.
- A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 90(431):928–934. doi:10.1080/01621459.1995.10476592.
- Lai, T. L. (1976). On confidence sequences. The Annals of Statistics, 4(2). doi:10.1214/aos/1176343406.
- Sequential testing of multinomial hypotheses with applications to detecting implementation errors and missing data in randomized experiments. URL https://arxiv.org/abs/2011.03567v1.
- A tutorial on Fisher information. Journal of Mathematical Psychology, 80:40–55. doi:10.1016/j.jmp.2017.05.006.
- Kendall’s Advanced Theory of Statistics, volume 2B: Bayesian Inference. Arnold, London, UK, second edition.
- Likelihood, replicability and Robbins' confidence sequences. International Statistical Review, 88(3):599–615. doi:10.1111/insr.12355.
- Efficient alternatives for Bayesian hypothesis tests in psychology. Psychological Methods. doi:10.1037/met0000482.
- R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology, 20(1):244. doi:10.1186/s12874-020-01105-9.
- Raftery, A. E. (1999). Bayes factors and BIC. Sociological Methods & Research, 27(3):411–427. doi:10.1177/0049124199027003005.
- RECOVERY Collaborative Group (2021). Dexamethasone in hospitalized patients with Covid-19. New England Journal of Medicine, 384(8):693–704. doi:10.1056/nejmoa2021436.
- Robbins, H. (1970). Statistical methods related to the law of the iterated logarithm. The Annals of Mathematical Statistics, 41(5):1397–1409. doi:10.1214/aoms/1177696786.
- Royall, R. (1997). Statistical evidence: a likelihood paradigm. Chapman & Hall, London New York.
- Calibration of p values for testing precise null hypotheses. 55(1):62–71. doi:10.1198/000313001300339950.
- Shafer, G. (2021). Descriptive probability. Working paper #59 (version September 30, 2021). http://probabilityandfinance.com/articles/59.pdf.
- Bayesian Approaches to Clinical Trials and Health-Care Evaluation. New York: Wiley.
- Vovk, V. G. (1993). A logic of probability, with application to the foundations of statistics. Journal of the Royal Statistical Society: Series B (Methodological), 55(2):317–341. doi:10.1111/j.2517-6161.1993.tb01904.x.
- Wagenmakers, E.-J. (2022). Approximate objective Bayes factors from P-values and sample size: The 3pn3𝑝𝑛3p\sqrt{n}3 italic_p square-root start_ARG italic_n end_ARG rule. doi:10.31234/osf.io/egydq.
- The support interval. Erkenntnis, 87:589–601. doi:10.1007/s10670-019-00209-z.
- History and nature of the Jeffreys-Lindley paradox. Archive for History of Exact Sciences, 77:25–72. doi:10.1007/s00407-022-00298-3.
- Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Springer International Publishing, Cham, Switzerland. doi:10.1007/978-3-319-32562-0.