Fixed-Label Bootstrap: Methods & Applications

Updated 4 July 2026

Fixed-Label Bootstrap is a resampling technique that fixes a design element—such as a volatility path, block size, or matching count—to propagate uncertainty only through the remaining components.
The methodological analysis reveals that such fixed structures enable tractable conditional inference in settings like conditional Value-at-Risk and fixed-b subsampling, albeit with trade-offs in calibration and coverage.
Extensions into semi-supervised learning and numerical conformal bootstrap illustrate the broad use of fixed-label approaches as a metaphor for stabilizing inference while managing non-regularity.

Searching arXiv for recent and relevant papers on fixed-label / fixed-design / fixed-parameter bootstrap and related resampling regimes. Fixed-Label Bootstrap denotes a family of bootstrap or bootstrap-like constructions in which some structural component of the problem is kept fixed while uncertainty is propagated through another component. In the literature summarized here, the fixed object may be the estimated volatility path in a residual bootstrap for conditional Value-at-Risk, the relative block size $b=l/n$ in subsampling and moving block bootstrap, the number of nearest neighbors $M$ in matching estimators, or a bootstrapped label set used downstream in semi-supervised learning. This suggests that the unifying issue is not a single canonical algorithm, but a recurring methodological choice: whether inferential validity is preserved when a design, tuning parameter, combinatorial feature, or pseudo-label configuration is treated as fixed rather than re-randomized or asymptotically negligible (Beutner et al., 2018, Shao et al., 2012, Lin et al., 2024, Wang et al., 2024, Albert et al., 2020).

1. Conceptual scope and unifying structure

The phrase is used in several technically distinct ways. In conditional volatility models, the fixed-design residual bootstrap keeps the estimated volatility recursion fixed and resamples only innovations (Beutner et al., 2018). In time-series resampling, fixed- $b$ asymptotics keep the block-size ratio $b=l/n$ fixed so that the effect of the resampling bandwidth survives in the limit (Shao et al., 2012). In nearest-neighbor matching for the average treatment effect, the relevant fixed feature is the number of matches $M$ ; with fixed $M$ , the naive bootstrap is inconsistent, whereas with $M\to\infty$ the inconsistency disappears (Lin et al., 2024). In fixed-sample interval assessment, the key question is how bootstrap intervals behave when the realized sample size is fixed rather than asymptotically large (Wang et al., 2024). In semi-supervised learning, Reliable Label Bootstrapping constructs an enlarged labeled set and then treats those labels as fixed for subsequent training (Albert et al., 2020).

Domain	Fixed object	Main issue
Conditional VaR	Volatility path $\tilde\sigma_t(\hat\theta_n)$	Residual bootstrap consistency
Time series	Relative block size $b=l/n$	Nonstandard p-value limits and calibration
Matching estimators	Number of matches $M$	Bootstrap failure for fixed $M$ 0
Fixed-sample interval theory	Sample size $M$ 1	Exact finite- $M$ 2 coverage and length
Semi-supervised learning	Bootstrapped reliable labels	Downstream SSL with fixed pseudo-labels

A central distinction is between fixed structure as a deliberate inferential device and fixed structure as a source of non-regularity. Fixed-design residual resampling is analytically advantageous because conditional on the observed design, the bootstrap randomness is simple and tractable (Beutner et al., 2018). By contrast, fixed $M$ 3 in matching creates an irregular dependence on nearest-neighbor identities and breaks naive bootstrap validity (Lin et al., 2024). Fixed- $M$ 4 occupies an intermediate position: the usual bootstrap approximation is not first-order consistent under fixed $M$ 5, but the limiting distribution of the resampling p-value remains well defined and can be used for calibration (Shao et al., 2012).

2. Fixed-design residual bootstrap in conditional Value-at-Risk

In the conditional VaR setting, returns satisfy

$M$ 6

with $M$ 7, iid innovations $M$ 8, and conditional VaR

$M$ 9

The estimator is the two-step Francq–Zakoïan procedure: QML for $b$ 0, then the empirical $b$ 1-quantile of residuals, yielding

$b$ 2

(Beutner et al., 2018).

The fixed-design residual bootstrap keeps the estimated volatility path fixed. Its algorithm is: resample $b$ 3 iid from the empirical residual distribution, generate

$b$ 4

re-estimate $b$ 5, recompute the bootstrap residual quantile $b$ 6, and form

$b$ 7

The defining feature is that $b$ 8 is not recursively updated using $b$ 9; the design is fixed across bootstrap samples (Beutner et al., 2018).

Under Assumptions 4.1–4.10 with sufficiently strong moments, the bootstrap consistently reproduces the joint asymptotic law of $b=l/n$ 0. In particular,

$b=l/n$ 1

and the conditional distribution of $b=l/n$ 2 and the asymptotic normal law merge in probability (Beutner et al., 2018).

The paper also studies interval construction. It defines equal-tailed percentile (EP), reversed-tails (RT), and symmetric (SY) bootstrap intervals. The simulation study reports that EP intervals tend to under-cover, whereas RT intervals yield accurate coverage; fixed-design bootstrap performs similarly to recursive-design bootstrap in average coverage but produces shorter intervals in smaller samples. For example, in GARCH(1,1) with Student- $b=l/n$ 3 innovations, high persistence, $b=l/n$ 4, 90% nominal coverage, fixed-design RT has coverage 90.2% and average length 0.797, while EP has coverage 79.6% with the same average length 0.797 (Beutner et al., 2018).

This case exemplifies a fixed-label principle in its most literal statistical form: the conditional variance design is treated as fixed, and only the innovation labels are resampled. The advantage is that the bootstrap world inherits iid innovation structure conditional on the original sample, making conditional CLTs and Bahadur-type expansions tractable (Beutner et al., 2018).

3. Fixed- $b=l/n$ 5 subsampling and block bootstrap

In dependent-data inference, subsampling and block bootstrap require a bandwidth-like parameter, the block length $b=l/n$ 6, or equivalently the relative block size

$b=l/n$ 7

Traditional asymptotics impose $b=l/n$ 8 and $b=l/n$ 9. The fixed- $M$ 0 approach instead holds $M$ 1 fixed as $M$ 2, so the effect of the tuning parameter survives in the limit (Shao et al., 2012).

For the sample mean, the subsampling approximation is based on

$M$ 3

Under fixed $M$ 4, this distribution is not consistent in the usual small- $M$ 5 sense. The central insight is that the p-value, not the raw resampling distribution, has a nondegenerate limit depending explicitly on $M$ 6. For a one-sided mean test, the subsampling p-value converges to

$M$ 7

where $M$ 8 is standard Brownian motion (Shao et al., 2012).

This yields a calibration strategy. Under small- $M$ 9, p-values are asymptotically $M$ 0, but under fixed- $M$ 1 they converge to $M$ 2, $M$ 3, $M$ 4, or related functionals depending on the statistic and resampling scheme. Confidence sets are therefore calibrated by replacing the nominal tail probability $M$ 5 with the appropriate fixed- $M$ 6 critical value $M$ 7. For scalar parameters, the resulting calibration is pivotal for given $M$ 8; for vector and infinite-dimensional parameters, the fixed- $M$ 9 p-value limit is generally non-pivotal, and the paper proposes double subsampling to estimate its distribution (Shao et al., 2012).

The empirical message is that fixed- $M\to\infty$ 0 calibration improves coverage accuracy relative to uncalibrated small- $M\to\infty$ 1 methods. Simulation studies for scalar parameters, vector parameters, marginal distribution functions, and normalized spectral distribution functions show that calibrated confidence sets tend to have smaller coverage errors than uncalibrated counterparts, especially under positive dependence, at the cost of only slightly wider intervals or regions (Shao et al., 2012).

This use of “fixed-label” is structurally different from fixed-design residual resampling. Here the fixed object is not a covariate path but a tuning parameter. The method does not restore standard bootstrap consistency; instead it redefines the asymptotic target so that the tuning parameter’s influence is first-order rather than hidden in higher-order error (Shao et al., 2012).

4. Fixed combinatorial structure in nearest-neighbor matching

In causal inference with binary treatment, the data are $M\to\infty$ 2, the estimand is

$M\to\infty$ 3

and the bias-uncorrected $M\to\infty$ 4-nearest-neighbor matching estimator is

$M\to\infty$ 5

where the missing potential outcome is imputed from the $M\to\infty$ 6 nearest neighbors in the opposite treatment group (Lin et al., 2024).

The Abadie–Imbens failure-of-bootstrap phenomenon occurs when $M\to\infty$ 7 is fixed. In that regime, matching depends non-smoothly on the identity of nearest neighbors, the random number of times each unit is used as a match enters the asymptotic linear representation, and the naive bootstrap does not reproduce the randomness of the nearest-neighbor graph. The estimator is $M\to\infty$ 8-consistent and asymptotically normal, but the conditional law of the bootstrap statistic fails to converge to the correct Gaussian limit when $M\to\infty$ 9 is fixed (Lin et al., 2024).

The 2024 paper overturns the fixed- $\tilde\sigma_t(\hat\theta_n)$ 0 interpretation by showing that the inconsistency arises solely from holding $\tilde\sigma_t(\hat\theta_n)$ 1 fixed. When

$\tilde\sigma_t(\hat\theta_n)$ 2

the matching estimator admits an asymptotically linear representation with influence function

$\tilde\sigma_t(\hat\theta_n)$ 3

and, after bias correction under additional smoothness and rate conditions,

$\tilde\sigma_t(\hat\theta_n)$ 4

where $\tilde\sigma_t(\hat\theta_n)$ 5 is the semiparametric efficiency bound for the ATE (Lin et al., 2024).

The bootstrap analogue is equally explicit. Under

$\tilde\sigma_t(\hat\theta_n)$ 6

the paper proves

$\tilde\sigma_t(\hat\theta_n)$ 7

and the conditional bootstrap law converges in probability to the same Gaussian law as the sampling distribution of the estimator (Lin et al., 2024).

The mechanism is a smoothing transition. With fixed $\tilde\sigma_t(\hat\theta_n)$ 8, the estimator is driven by a finite combinatorial structure; with $\tilde\sigma_t(\hat\theta_n)$ 9 and $b=l/n$ 0, each unit’s neighborhood becomes a local average, the matching radius shrinks at rate $b=l/n$ 1, and matching counts behave like a density-ratio estimator,

$b=l/n$ 2

which converges to the density ratio $b=l/n$ 3 in $b=l/n$ 4 under stated conditions (Lin et al., 2024).

This section illustrates a common misconception. The naive bootstrap does not fail for matching because matching is intrinsically incompatible with bootstrap resampling; rather, it fails in the fixed- $b=l/n$ 5 regime. Once the discrete matching degree is allowed to diverge at suitable rates, the estimator becomes asymptotically linear and semiparametrically efficient, and the standard bootstrap is valid again (Lin et al., 2024).

5. Fixed-sample assessment of bootstrap intervals

A different line of work studies bootstrap intervals from a fixed-sample perspective. For a bootstrap interval constructed from $b=l/n$ 6 bootstrap replications and indices

$b=l/n$ 7

the question is not asymptotic consistency but exact finite- $b=l/n$ 8 coverage probability and expected length. The paper derives exact probabilistic formulas for parametric and percentile bootstrap intervals for binomial proportions, functions of two proportions, and normal means (Wang et al., 2024).

For parametric bootstrap, the exact coverage formula is expressed through the binomial CDF of order statistics. In the normal mean case with known variance and an estimator $b=l/n$ 9 whose density is symmetric around $M$ 0, the coverage of the parametric bootstrap interval is

$M$ 1

Thus the actual coverage depends only on the bootstrap replication count $M$ 2, not on $M$ 3, $M$ 4, or $M$ 5, and is strictly below the nominal level $M$ 6 for finite $M$ 7 (Wang et al., 2024).

For percentile bootstrap based on the sample mean under normality, the paper derives the general bound

$M$ 8

Hence, when $M$ 9, the maximum possible coverage is $M$ 00, so a 95% percentile bootstrap interval cannot achieve 95% coverage. In Example A2 with $M$ 01, the confidence coefficient is at most $M$ 02 for any nominal level (Wang et al., 2024).

For binomial proportions and functions of proportions, the diagnosis is more severe: the infimum coverage probability is always $M$ 03, regardless of $M$ 04, $M$ 05, nominal level, or estimator. This applies to bootstrap intervals for a single proportion, the difference of two proportions, and the odds ratio. The paper therefore argues that comparing such intervals by expected length at the same nominal level is misleading, because the bootstrap intervals do not have the stated confidence coefficient (Wang et al., 2024).

The normal mean examples make the same point in a less extreme setting. With $M$ 06, $M$ 07, and nominal 90%, the parametric bootstrap interval based on the sample mean has $M$ 08 and expected length $M$ 09, while the classical z-interval has exact coverage $M$ 10 and expected length $M$ 11. The bootstrap interval is narrower, but only because it under-covers. The paper characterizes such conclusions as illogical if the nominal level is used as the baseline for interval comparison (Wang et al., 2024).

This fixed-sample perspective reframes fixed-label bootstrap as an assessment problem. Even when asymptotic bootstrap theory is correct, finite- $M$ 12 behavior can remain poor, non-monotone in nominal level, and globally unreliable over the parameter space (Wang et al., 2024).

6. Broader extensions and analogical uses

The phrase also appears outside classical resampling theory. In semi-supervised learning, Reliable Label Bootstrapping (ReLaB) starts from a tiny labeled set $M$ 13, learns self-supervised features $M$ 14, propagates labels over a graph with

$M$ 15

constructs hard pseudo-labels, filters them by a small-loss criterion, and then trains a semi-supervised algorithm on the resulting reliable labeled subset $M$ 16. The pseudo-labels retained in $M$ 17 are treated as fixed hard labels during downstream training (Albert et al., 2020).

The paper reports large gains in extremely low-supervision regimes. On CIFAR-10 with 1 random labeled sample per class, ReLaB with ReMixMatch reaches average error $M$ 18; when the labeled sample in each class is highly representative, this falls to $M$ 19. These results depend strongly on representation quality; ResNet-50 with iMix gives the lowest propagation noise among the self-supervised configurations studied (Albert et al., 2020).

In numerical conformal bootstrap, the phrase is used more analogically. Mixed-correlator studies fix structural labels such as spacetime dimension, global symmetry group, correlator choice, and minimal spectral assumptions, then scan over a small parameter space of scaling dimensions. For $M$ 20 and $M$ 21, a semi-blind mixed-correlator bootstrap with the assumption of only one relevant scalar singlet finds a single allowed region in $M$ 22 space; the Heisenberg and chiral large- $M$ 23 points lie on or near its boundary, the antichiral point lies outside, and a sharp kink appears near $M$ 24 that does not correspond to any large- $M$ 25-predicted $M$ 26 critical theory (Dowens et al., 2020). A later computational paper treats such fixed external labels $M$ 27 as dynamic variables and replaces the “one SDP per point” workflow with a joint algorithm, “skydiving,” for families of semidefinite programs depending smoothly on $M$ 28 (Liu et al., 2023).

These extensions are not bootstrap procedures in the Efron sense. A plausible implication is that “fixed-label bootstrap” has become a broader methodological metaphor for workflows in which a structured set of labels, assumptions, or pseudo-labels is held fixed while another layer of uncertainty, optimization, or resampling is propagated. Across the literatures surveyed here, the recurring lesson is consistent: fixing structure can either stabilize inference or destroy it, depending on whether the fixed object is a benign conditioning device, a non-negligible tuning parameter, or an irregular combinatorial constraint (Beutner et al., 2018, Shao et al., 2012, Lin et al., 2024, Albert et al., 2020, Liu et al., 2023).