Weighted Gap-Intersection Procedure
- The procedure integrates prior weights via a weighted log-likelihood ratio, adjusting evidence thresholds to privilege hypotheses based on external information.
- It uses gap and intersection rules to define stopping times, ensuring rigorous family-wise error control while meeting signal count constraints.
- The method achieves first-order asymptotic optimality and shows robust performance in high-dimensional and random-weight settings, outperforming unweighted approaches.
The Weighted Gap-Intersection Procedure is a sequential multiple testing algorithm designed to incorporate prior weights into each hypothesis stream, offering both strong control of the family-wise error rate (FWE) and first-order asymptotic optimality in expected stopping time. The approach formalizes the use of a weighted log-likelihood ratio (WLLR), generalizes classical sequential testing boundaries to exploit both order and magnitude gaps, and achieves robust performance even in high-dimensional and random-weight regimes. This procedure allows efficient hypothesis selection when only broad signal-count bounds are known, extending previous gap and intersection methodologies. It stands out in information-theoretic efficiency, explicit error control, and practical scalability.
1. Weighted Log-Likelihood Ratio (WLLR)
For each hypothesis index , the observed data stream follows either the null law or alternative . The standard log-likelihood ratio is
where is the restriction of to data up to time .
To encode prior knowledge or importance, positive weights are assigned a priori. The weighted log-likelihood ratio modifies the evidence process as
This additive "head-start" shifts the boundaries for each stream, allowing the procedure to privilege hypotheses according to external information.
2. Formal Stopping and Decision Rules
The true signal set is only known to satisfy for integers . At time , WLLRs are ordered
with , .
Define the number of positive WLLRs: Let be fixed thresholds chosen for FWE control.
The composite stopping time is
with three boundary definitions:
- Intersection Rule:
- Lower-Boundary Gap Rule ():
- Upper-Boundary Gap Rule ():
At , the decision set is
This selection can involve adding indices with highest WLLR or removing lowest to meet signal count bounds.
3. Implementation Pseudocode
High-Level Outline:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Initialize n ← 1
Repeat:
For each j: λ_W^j(n) ← λ_W^j(n-1) + ln f_1^j(X_n^j) / f_0^j(X_n^j)
Sort λ_W^j(n) descending; calculate p_W(n)
If stopping condition τ_{1,W} or τ_{2,W} or τ_{3,W} holds:
Set T_{W,GI} ← n
Break
Else:
n ← n + 1
After exit:
D ← {j : λ_W^j(T_{W,GI}) > 0}
If |D| < l: add top (l - |D|) indices
If |D| > u: remove indices with smallest WLLRs to truncate to u |
4. Family-Wise Error Rate (FWE) Control
Proposition: To achieve
it is sufficient to set thresholds as
with
This approach utilizes exponential tail bounds derived via Wald’s change-of-measure and union bounding over possible false inclusions or exclusions. The resulting error control extends to all signal-count compatible alternatives.
5. Asymptotic Optimality Theory
Define per-hypothesis information rates: For ,
The Song–Fellouris lower bound (Bose et al., 10 Nov 2025) on expected stopping time for any procedure in is
The Weighted Gap–Intersection procedure achieves first-order optimality: The proof utilizes reduction to a collection of independent random walks crossing boundaries and matches the lower bound up to vanishing second-order terms.
6. High-Dimensional and Random-Weights Analysis
(a) Fixed weights, large : The following scaling conditions
ensure first-order optimality. If all are uniformly bounded, only is required.
(b) Random weights: For weights known law (drawn once before sampling), adaptive thresholding (e.g., ) guarantees conditional FWE control. Unconditional expected sample size remains optimal if
Admissible laws include bounded, binary, log-normal, and Pareto weights provided extreme statistics grow only poly-logarithmically in .
7. Simulation Study and Empirical Behavior
Simulations conducted under the Gaussian means model with target error rates and signal counts to hypotheses ratios typical for high-dimensional inference (, ).
Four weight scenarios were tested:
- Unweighted:
- Informative: true signals receive with probability favoring correct identification
- Misinformative: weights anti-correlated to true signals
- Noisy: weights independent, mean-1 with controlled variance
Empirical effective sample size (ESS) is lowest when weights are informative, outperforming the unweighted baseline. When weights are noisy or misleading, ESS exceeds baseline, consistent with the theoretical second-order term. This suggests the practical robustness of the procedure, but also cautions against poorly chosen weights.
Summary Table: Procedural Features
| Property | Weighted Gap-Intersection | Classic Sequential (Unweighted) |
|---|---|---|
| Prior weights incorporated | Yes | No |
| Signal count bounds allowed | Often fixed or unconstrained | |
| Error control (FWE) | Explicit (via thresholds) | Varies; often only Type I |
| High-dimensional scalability | Typically limited | |
| ESS matches lower bound | Yes (first-order) | Yes for gap/intersection variants |
A plausible implication is that weighted procedures are especially advantageous when reliable external information about hypothesis relevance is available. In summary, the Weighted Gap-Intersection procedure generalizes multiple sequential testing to exploit prior-weighted evidence, maintains rigorous error guarantees, and achieves information-theoretic optimality in broad operational regimes (Bose et al., 10 Nov 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free