Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 74 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 13 tok/s Pro

GPT-5 High 20 tok/s Pro

GPT-4o 87 tok/s Pro

Kimi K2 98 tok/s Pro

GPT OSS 120B 464 tok/s Pro

Claude Sonnet 4 40 tok/s Pro

2000 character limit reached

The false positive risk: a proposal concerning what to do about p values (1802.04888v6)

Published 13 Feb 2018 in stat.AP

Abstract: It is widely acknowledged that the biomedical literature suffer from a surfeit of false positive results. Part of the reason for this is the persistence of the myth that observation of a p value less than 0.05 is sufficient justification to claim that you've made a discovery. It is hopeless to expect users to change their reliance on p values unless they are offered an alternative way of judging the reliability of their conclusions. If the alternative method is to have a chance of being adopted widely, it will have to be easy to understand and to calculate. One such proposal is based on calculation of false positive risk. It is suggested that p values and confidence intervals should continue to be given, but that they should be supplemented by a single additional number that conveys the strength of the evidence better than the p value. This number could be the minimum false positive risk (that calculated on the assumption of a prior probability of 0.5, the largest value that can be assumed in the absence of hard prior data). Alternatively one could specify the prior probability that it would be necessary to believe in order to achieve a false positive risk of, say, 0.05.

Citations (134)

View on Semantic Scholar

Summary

The paper introduces the False Positive Risk as a Bayesian alternative to conventional p-values to more accurately estimate the likelihood of false discoveries.
It details a method using neutral priors and likelihood ratios to recalibrate p-values that often understate the true risk of false positives.
The proposal aims to enhance reproducibility and reliability in research by providing a practical measure that aligns statistical interpretation with real-world data.

Analyzing the Proposal to Reframe p-Values through False Positive Risk

David Colquhoun's paper, "The False Positive Risk: a proposal concerning what to do about p-values," rigorously addresses the ongoing debate in statistical analysis, particularly in biomedical research, regarding the reliance on p-values as evidence of scientific validity. Colquhoun eloquently argues that the traditional use of p-values—specifically the threshold of p<0.05—as significant evidence for discoveries is fundamentally flawed and often leads to a high incidence of false positives.

The False Positive Risk Approach

Colquhoun extends his previous work by introducing the concept of False Positive Risk (FPR) as a more intuitive and reliable alternative to conventional p-value reporting. He suggests that while p-values should still be reported, they must be accompanied by an additional measure that accurately reflects the likelihood of the observed result being a false positive. This measure is calculated using Bayesian principles, attempting to align the actual interpretation of statistical evidence with common misconceptions among researchers who often equate p-values to probabilities of chance occurrence.

A key proposal is to base this additional measure—FPR—on either a neutral prior probability of 0.5 or on specifying the prior probability needed to achieve an FPR of a specified threshold, say 0.05. This would ostensibly correct the widespread misinterpretation that a p-value alone can provide a direct probability of a hypothesis being true given the data.

Statistical Interpretation and Real-World Application

Colquhoun critiques current statistical teaching and practice for failing to correct common misinterpretations, advocating for a recalibration of p-values to the false positive risk. This recalibration using a Bayesian framework provides an accessible method to quantify the strength of evidence against null hypotheses. By utilizing the simple alternative hypothesis and likelihood ratios, the false positive risk inherently communicates the likelihood of an observed effect being genuine.

The practical implication of this correction is significant; under typical testing scenarios, the false positive risk numbers often exceed the stated p-values, thus revealing a potential overestimation of evidence against the null. Real-world examples from biomedical research underscore the utility of this recalibration. For instance, in a real paper of transcranial electromagnetic stimulation, Colquhoun demonstrates how the conventional interpretation of p=0.043 is misleadingly optimistic, while an FPR approach yields a more conservative interpretation.

Alternative Statistical Approaches

Colquhoun's work does not stand in isolation. It is juxtaposed with several alternative propositions for handling p-values, such as using full Bayesian analyses and adopting more stringent p-value thresholds (e.g., Benjamin et al.'s recommendation to shift the significance threshold to p<0.005). However, Colquhoun contends that while more ambitious Bayesian frameworks are instructive, they are often impractical for routine laboratory use, given complexities in defining informative priors and computational burdens.

Implications for Future Research and Statistical Practice

While the paper does not claim to solve all issues related to statistical inferencing, it offers a practical compromise geared towards improving the reproducibility and reliability of scientific findings. The FPR offers a viable path forward that balances statistical complexity with usability in applied research. As scientific communities continue to grapple with the reproducibility crisis, measures such as these that provide clearer interpretations of statistical significance may gain traction.

Going forward, further empirical validation and refinement of these methods could solidify their place in experimental methodologies. The adoption of false positive risk, or similar recalibrative strategies, may serve not only as a stopgap against false discoveries but also as a catalyst for more rigorous data analysis frameworks.