B-Calibration in Fixed-b Resampling
- B-Calibration is a method that calibrates resampling-based p-values under fixed-b asymptotics, addressing the impact of non-negligible block sizes.
- It uses the limiting distribution of p-values—obtained via analytic or Monte Carlo methods—to correct nominal test levels and ensure valid coverage.
- The approach extends to multivariate and functional inference, robustly calibrating confidence regions for scalar, vector, and infinite-dimensional parameters.
B-calibration refers to a set of statistical procedures and methodologies that address the calibration of resampling-based inference when the bandwidth parameter (block size, subsample length, or more generally, a smoothing or dependence parameter) is not asymptotically negligible compared to the sample size—so-called "fixed-b" asymptotics. The term particularly encapsulates the process of calibrating inference via the distribution of p-values under these fixed-b limits, thereby ensuring that statistical tests and confidence sets achieve accurate finite-sample coverage in dependent data settings such as time series. In the broader literature, "B-calibration" may also appear in the context of tuning parameters or performance calibration in high energy physics; however, the canonical usage for statistical inference is as rigorous methodology for p-value calibration under fixed-b block resampling (Shao et al., 2012).
1. Foundations: Fixed-b Asymptotics in Resampling
The classical resampling framework for dependent data (e.g., time series) involves creating either subsamples or blocks of length , with the relative size . Traditionally, asymptotics are developed under the regime as ("small-b"), ensuring the influence of the bandwidth parameter vanishes at first order. Standard plug-in confidence sets or tests then remain pivotal in their limiting properties, and the bandwidth selection primarily only affects higher-order error.
In contrast, fixed-b asymptotics assume as , so the block size remains a constant positive fraction of the sample size. This change in theoretical regime alters the limiting distribution of key statistics—most notably, the distribution of p-values under the null hypothesis—such that they are no longer uniform but instead explicit -dependent distributions. This shift necessitates calibration to recover accurate test levels and interval coverages (Shao et al., 2012).
2. Limiting Distribution of Resampling-Based p-values
For a statistic targeting a parameter (with, e.g., converging to a nondegenerate limit), the resampling-based (subsampling or block bootstrap) p-value, typically
(where is computed on the th block/subsample of length ), converges to a non-uniform law under fixed-b asymptotics. That is, with cumulative distribution function .
For moving block bootstrap, the analogous p-value also converges to a -dependent law . The explicit forms of these laws are typically functionals of underlying Gaussian or Brownian motion processes and can be determined by simulation or analytic computation for given (Shao et al., 2012).
3. P-Value Calibration Procedure
Because the limiting p-value is non-uniform (and depends on the chosen ), classical critical values (e.g., for a level- test) cannot guarantee actual rejection probability . The calibration procedure replaces the nominal cutoff with the -quantile of the null distribution: so that under the null,
which restores correct test size. For two-sided confidence intervals or regions, this is equivalent to "inverting" the calibrated test: This B-calibration paradigm converts the non-pivotal, resampling-based p-value into an exact pivotal test in the fixed-b regime (Shao et al., 2012).
| Type | Limiting p-value law | Calibration Quantile |
|---|---|---|
| Subsample | ||
| Bootstrap |
4. Generalization to Multivariate and Functional Inference
The B-calibration methodology is not limited to scalar parameters. For vector-valued settings, with estimators and statistics such as , the limiting p-value law is again non-uniform and depends on . For infinite-dimensional functionals—e.g., empirical distribution functions or spectral distribution functions—the sup-norm statistic over a class follows an analogous limiting distribution under fixed-b, with corresponding quantile for confidence bands.
The calibration is then: for finite-dimensional , and
for functional inference (Shao et al., 2012).
5. Implementation and Practical Guidance
Implementation of B-calibration involves tabulating the relevant critical values for a set of values, either via analytical simulation of the limiting process or by large-scale Monte Carlo of block-resampled statistics. For applied work, the recommended approach is:
- Simulate 5000+ paths of the relevant Gaussian process, compute the empirical -distribution of the resampling p-value, and record quantiles .
- Alternatively, use a secondary Monte Carlo based on the block bootstrap under the null to approximate the limit law.
- Prepare a lookup table for across a range of values (e.g., ), which may be accurately modeled by low-order polynomials in .
Block length selection remains a balance between bias (short blocks) and variance (long blocks), but the fixed-b theory provides explicit quantification of -dependence, reducing sensitivity to block-length choice when calibrated (Shao et al., 2012).
6. Empirical Performance and Coverage Properties
Extensive simulation in (Shao et al., 2012) demonstrates that traditional small-b methods systematically undercover, especially with strong dependence. Fixed-b B-calibrated intervals exhibit nominal coverage closer to the target across a range of settings. The increased interval or band width is typically modest (10–20%), representing a tradeoff for improved statistical accuracy. Performance gains extend to confidence regions and bands for vector and infinite-dimensional targets, provided the distribution of the resampling p-value is properly approximated.
7. Applicability and Impact in Statistical Inference
B-calibration provides a unified and formally justified approach for accurate inference in time series and other dependent data contexts, incorporating the real-world influence of the bandwidth parameter at leading order. This directly improves confidence set coverage, test validity, and the reliability of bootstrap/subsampling-based procedures. Its generality covers a wide spectrum of practical estimation problems, including but not limited to means, distribution functions, spectral densities, and beyond, ensuring robust results without reliance on asymptotically negligible block sizes.
The methodology has been adopted across advanced empirical studies where resampling is essential and the classical small-b assumption is either theoretically invalid or practically difficult to maintain. It offers a lucid quantification and correction mechanism for the resampling bias introduced by non-negligible bandwidth choices, thus refining the inferential power of modern statistics (Shao et al., 2012).