Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 95 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 15 tok/s Pro

GPT-5 High 19 tok/s Pro

GPT-4o 90 tok/s Pro

GPT OSS 120B 449 tok/s Pro

Kimi K2 192 tok/s Pro

2000 character limit reached

Abandon Statistical Significance (1709.07588v3)

Published 22 Sep 2017 in stat.ME

Abstract: We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm--and the p-value thresholds intrinsic to it--as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to "ban" p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly.

Citations (721)

View on Semantic Scholar

Collections

Summary

The paper critically evaluates the common practice of null hypothesis significance testing (NHST), arguing that relying on arbitrary p-value thresholds like 0.05 contributes to poor replication rates and misinterpretation of evidence.
While advocating for abandoning NHST, the authors also critique proposed alternatives like lowering the p-value threshold to 0.005 as insufficient to address the fundamental issues of evidence evaluation.
The authors recommend a shift towards a framework where evidence is evaluated holistically, considering p-values continuously alongside factors like study design, prior evidence, and plausibility, rather than using binary statistical significance.

Abandoning Statistical Significance: A Critical Perspective on NHST and its Alternatives

The paper "Abandon Statistical Significance" by McShane, Gal, Gelman, Robert, and Tackett offers a critical evaluation of the reliance on null hypothesis significance testing (NHST) in scientific research, particularly within the biomedical and social sciences. The authors argue for a paradigmatic shift away from the entrenched use of statistical significance, particularly the ubiquitous $p < 0.05$ rule, as a determinant of scientific validity and discovery.

Critique of the Current Statistical Paradigm

The authors begin by addressing a critical issue in scientific practice: the low rate of replication of research findings. They link this to the conventional use of statistical significance, which typically operates as a binary threshold for validating or invalidating hypotheses. This approach, they argue, privileges results that meet arbitrary $p$ -value cutoffs over nuanced and comprehensive evaluations of evidence. They further critique this paradigm for fostering errors in scientific reasoning and incentivizing studies that may yield significant results purely from noise.

Key to their argument is the assertion that $p$ -values are often misinterpreted and can lead to misleading conclusions about the strength of evidence. This stems from the $p$ -value's reliance on an often implausible null hypothesis, which assumes no effect and no systematic error—a condition seldom met in real-world data.

Discussion of Alternatives

While proposing to abandon NHST as the default statistical approach, the paper critiques alternative proposals, such as lowering the threshold for $p$ -values from 0.05 to 0.005, as insufficient. The authors suggest that such measures fail to address the foundational issues of evidence evaluation and can introduce new challenges, such as incoherent thresholds for what constitutes discovery and differential treatment of evidence based on arbitrary cutoffs.

The authors also critique the Uniformly Most Powerful Bayesian Tests (UMPBTs) underlying such proposals, arguing that they fail to adequately represent the complexities and variabilities inherent in scientific research contexts.

Proposed Recommendations

The paper proposes a framework for scientific inquiry that emphasizes a holistic evaluation of evidence rather than reliance on statistical thresholds. Under this framework, $p$ -values should be considered continuously and in context with other factors, such as paper design, plausibility of mechanisms, prior evidence, and cost-benefit analyses.

For authors, it recommends transparent reporting of all data and a comprehensive discussion of the factors that influence their findings. For editors and reviewers, it suggests that publication decisions consider a broad array of evidential factors, moving beyond a binary interpretation of significance.

Implications and Future Directions

The implications of this proposal extend beyond academic publishing to encompass all areas where statistical decision making is practiced, such as medicine and policy evaluation. The authors advocate for a shift toward statistical methods that embrace uncertainty and account for variation, such as multilevel modeling and more sophisticated hypothesis evaluation approaches.

In conclusion, "Abandon Statistical Significance" raises important questions about the appropriateness of current statistical practices in scientific research. By advocating for a more integrated and contextually aware approach to evaluating evidence, the paper opens the door for more robust and replicable scientific inquiry. Future developments may focus on refining methodological tools that support this broader evidential framework, fostering a research culture that prioritizes understanding over statistical shortcuts.