- The paper critically evaluates the common practice of null hypothesis significance testing (NHST), arguing that relying on arbitrary p-value thresholds like 0.05 contributes to poor replication rates and misinterpretation of evidence.
- While advocating for abandoning NHST, the authors also critique proposed alternatives like lowering the p-value threshold to 0.005 as insufficient to address the fundamental issues of evidence evaluation.
- The authors recommend a shift towards a framework where evidence is evaluated holistically, considering p-values continuously alongside factors like study design, prior evidence, and plausibility, rather than using binary statistical significance.
Abandoning Statistical Significance: A Critical Perspective on NHST and its Alternatives
The paper "Abandon Statistical Significance" by McShane, Gal, Gelman, Robert, and Tackett offers a critical evaluation of the reliance on null hypothesis significance testing (NHST) in scientific research, particularly within the biomedical and social sciences. The authors argue for a paradigmatic shift away from the entrenched use of statistical significance, particularly the ubiquitous p<0.05 rule, as a determinant of scientific validity and discovery.
Critique of the Current Statistical Paradigm
The authors begin by addressing a critical issue in scientific practice: the low rate of replication of research findings. They link this to the conventional use of statistical significance, which typically operates as a binary threshold for validating or invalidating hypotheses. This approach, they argue, privileges results that meet arbitrary p-value cutoffs over nuanced and comprehensive evaluations of evidence. They further critique this paradigm for fostering errors in scientific reasoning and incentivizing studies that may yield significant results purely from noise.
Key to their argument is the assertion that p-values are often misinterpreted and can lead to misleading conclusions about the strength of evidence. This stems from the p-value's reliance on an often implausible null hypothesis, which assumes no effect and no systematic error—a condition seldom met in real-world data.
Discussion of Alternatives
While proposing to abandon NHST as the default statistical approach, the paper critiques alternative proposals, such as lowering the threshold for p-values from 0.05 to 0.005, as insufficient. The authors suggest that such measures fail to address the foundational issues of evidence evaluation and can introduce new challenges, such as incoherent thresholds for what constitutes discovery and differential treatment of evidence based on arbitrary cutoffs.
The authors also critique the Uniformly Most Powerful Bayesian Tests (UMPBTs) underlying such proposals, arguing that they fail to adequately represent the complexities and variabilities inherent in scientific research contexts.
Proposed Recommendations
The paper proposes a framework for scientific inquiry that emphasizes a holistic evaluation of evidence rather than reliance on statistical thresholds. Under this framework, p-values should be considered continuously and in context with other factors, such as paper design, plausibility of mechanisms, prior evidence, and cost-benefit analyses.
For authors, it recommends transparent reporting of all data and a comprehensive discussion of the factors that influence their findings. For editors and reviewers, it suggests that publication decisions consider a broad array of evidential factors, moving beyond a binary interpretation of significance.
Implications and Future Directions
The implications of this proposal extend beyond academic publishing to encompass all areas where statistical decision making is practiced, such as medicine and policy evaluation. The authors advocate for a shift toward statistical methods that embrace uncertainty and account for variation, such as multilevel modeling and more sophisticated hypothesis evaluation approaches.
In conclusion, "Abandon Statistical Significance" raises important questions about the appropriateness of current statistical practices in scientific research. By advocating for a more integrated and contextually aware approach to evaluating evidence, the paper opens the door for more robust and replicable scientific inquiry. Future developments may focus on refining methodological tools that support this broader evidential framework, fostering a research culture that prioritizes understanding over statistical shortcuts.