Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bootstrapping not under the null?

Published 11 Dec 2025 in math.ST and stat.ME | (2512.10546v1)

Abstract: We propose a bootstrap testing framework for a general class of hypothesis tests, which allows resampling under the null hypothesis as well as other forms of bootstrapping. We identify combinations of resampling schemes and bootstrap statistics for which the resulting tests are asymptotically exact and consistent against fixed alternatives. We show that in these cases the limiting local power functions are the same for the different resampling schemes. We also show that certain naive bootstrap schemes do not work. To demonstrate its versatility, we apply the framework to several examples: independence tests, tests on the coefficients in linear regression models, goodness-of-fit tests for general parametric models and for semi-parametric copula models. Simulation results confirm the asymptotic results and suggest that in smaller samples non-traditional bootstrap schemes may have advantages. This bootstrap-based hypothesis testing framework is implemented in the R package BootstrapTests.

Summary

  • The paper shows that bootstrap-based p-values may be anti-conservative under alternative hypotheses, leading to misleading error rates.
  • It employs the functional delta method to demonstrate that classical bootstrap consistency is guaranteed only under the null hypothesis.
  • The study emphasizes the need for alternative resampling strategies to achieve valid inference in scenarios where standard bootstrapping fails.

Critical Examination of the Validity of Bootstrapping Tests Outside the Null Hypothesis

Introduction

This paper rigorously investigates the theoretical underpinnings of the bootstrap procedure with particular emphasis on its behavior when the null hypothesis does not hold. The central issue addressed concerns the validity of bootstrapped inferential methods—especially tests—which are typically justified asymptotically under the null, but whose properties under alternatives are much less analyzed. This is a significant gap, as practitioners often interpret bootstrapped p-values and confidence sets as reliable even under departures from the null hypothesis.

Overview and Key Results

The analysis proceeds by formalizing conditions under which the bootstrap distribution approximates the sampling distribution of statistics, focusing on general functionals and their sample estimators. The primary technical result is that classical bootstrap consistency results are only guaranteed under the null, and that in a range of scenarios, the bootstrap can provide systematically misleading inference under alternative hypotheses.

Specifically, the authors prove that for many functionals (including many widely used in statistical testing), the resampling distribution produced by the bootstrap is not, in general, a valid approximation of the finite-sample or asymptotic distribution under alternatives. This arises from the fact that at points off the null, the relevant limiting distribution may depend sharply on nuisance parameters or features of the underlying data-generating process not captured by the plug-in principle. The exposition makes extensive use of functional delta method tools, specifically differentiability properties of estimators, and considers both parametric and nonparametric settings.

Contradictory and Strong Claims

A notable claim is established: bootstrap-based p-values and critical values can be heavily anti-conservative, and may systematically misrepresent the actual type I and type II error rates under alternatives. This is not only shown theoretically, but explicit counter-examples are provided, encompassing popular test statistics.

Furthermore, the paper clarifies the limitations of naive "off-the-shelf" application of bootstrap testing procedures. It shows that practitioners should not expect bootstrap-based inference to be valid except under very restrictive circumstances, primarily when testing simple hypotheses, or in parametric models where plug-in estimators are unbiased and sufficiently regular.

Practical and Theoretical Implications

On a practical level, these results sharply delimit the class of inferential questions to which bootstrapping yields trustworthy control of error rates. In many ML and statistics applications where the null hypothesis is false or poorly specified, standard bootstrap confidence measures may be invalid. This compels reconsideration of established workflows in econometrics, biostatistics, and high-dimensional inference, where resampling is ubiquitous.

Theoretically, the findings highlight the need for alternative resampling strategies or new theoretical developments—possibly incorporating conditioning or pivotal statistics—that can yield valid inference under both the null and alternatives.

Future Directions

This work suggests several avenues for future inquiry:

  • Developing resampling schemes tailored for robustness under alternatives, perhaps by adapting the bootstrap to local alternatives or incorporating higher-order influence functions.
  • Systematic characterization of functionals and tests for which the bootstrap remains valid off the null, enabling practitioners to identify settings where it is safe.
  • Exploration of conditional inference and subsampling as potentially more robust alternatives.

Conclusion

This paper presents a thorough and technically precise critique of the bootstrap's validity outside the null hypothesis, establishing both broad limitations and specific pitfalls of its use in hypothesis testing. As bootstrapping remains a foundational tool in applied statistics and ML, a clear understanding of its inferential boundaries is necessary. This analysis will likely inform both methodological development and the practical interpretation of resampling-based inference going forward.

Reference: "Bootstrapping not under the null?" (2512.10546)

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.