Sequential Testing with Confidence Sequences
- Sequential testing with confidence sequences is a framework that uses adaptive confidence intervals to decide when to stop sampling.
- The method controls Type I/II error rates and guarantees bounded sample sizes through the inclusion principle and operational characteristic functions.
- It offers practical advantages over traditional SPRT, with exact procedures applicable in settings like binomial proportion testing.
Sequential testing with confidence sequences is a methodological framework that enables hypothesis testing and inference under sequential data collection. Unlike classical fixed-sample analysis, this approach leverages confidence sequences—collections of intervals (or sets) with simultaneous time-uniform coverage guarantees—to control error rates over arbitrary and adaptive stopping rules. This article provides an in-depth treatment of the principles, methodologies, mathematical foundations, and practical applications of sequential testing based on confidence sequences.
1. Defining Sequential Testing with Confidence Sequences
Sequential testing with confidence sequences is a departure from classic likelihood ratio–based sequential tests, such as the Sequential Probability Ratio Test (SPRT). It centers around using confidence intervals (limits), adaptively constructed as data are observed, to determine both when to stop sampling and what statistical conclusion to draw.
Given hypotheses versus (with an indifference zone in between), the methodology proceeds as follows:
- At each stage , compute lower and upper confidence limits and for the target parameter .
- Continue sampling until breaches (i.e., ) or falls below (i.e., ).
- Upon stopping, make a decision based on the location of the confidence interval relative to the indifference zone; if the entire interval lies above (or below), accept (or ); otherwise, invoke a pre-specified tie-breaking rule.
By judiciously tuning the -levels for the confidence limits, one can control Type I and II error probabilities and ensure bounded sample sizes. This formulation supports both binary and multiple hypothesis testing, exploiting the properties of sequential random intervals and their coverage probabilities.
2. Operating Characteristic Function and Monotonicity
A cornerstone of this sequential testing framework is the operating characteristic (OC) function, . Adjustment of the confidence sequence directly tunes the OC function.
For many distributions, the tail probabilities underpinning the confidence limits are monotonic in the parameter (e.g., binomial, normal). This monotonicity ensures that the maximum Type I and Type II error probabilities occur at the indifference zone endpoints, substantially simplifying "risk tuning." The process involves selecting the confidence level such that coverage at these endpoints meets the prescribed error thresholds—a process made computationally efficient by monotonicity or recursive techniques.
3. Sequential Random Intervals and the Inclusion Principle
Sequential testing of multiple hypotheses is equivalently recast as constructing sequential random intervals with prescribed coverage probabilities. In this framework:
- Each possible realization of the interval post-sampling corresponds uniquely to one hypothesis.
- The procedure guarantees, with high probability, that upon stopping, the true parameter is contained within the selected random interval.
The paper introduces the inclusion principle (IP): At any stage, sampling proceeds until a controlling fixed-sample confidence interval is entirely nested within one of a finite collection of target intervals, each corresponding to a hypothesis. The IP ensures that desired coverage (or error control) is maintained, and provides a structural simplification—it reduces multistage planning to interval inclusion checks, unifying estimation and testing concepts.
4. Advantages Over Traditional Sequential Tests
Relative to the classical SPRT and its variants, confidence-sequence–based sequential tests show several advantages:
- Uniform Efficiency: By tuning the confidence limits (and hence the OC function) to maintain prescribed error rates across the parameter space, performance is not concentrated at a few "least favorable" alternatives.
- Bounded Sample Size: The confidence sequence converges with increasing data, meaning the stopping rule inevitably triggers, ensuring a finite upper bound on total samples required. In contrast, the SPRT can require an unbounded number of observations in worst-case configurations.
- Conceptual and Computational Simplicity: Early stopping is achieved using confidence limits, often with closed-form or easily computed expressions, removing the need for recursive likelihood ratio calculations.
- Unified Framework: The methodology accommodates multiple hypothesis settings and estimation within a common structure.
5. Concrete Implementation: Difference of Binomial Proportions
A prominent practical application is hypothesis testing about the difference of two binomial proportions ():
- Construct stage-wise Clopper–Pearson confidence limits for each proportion, and , based on observed counts (out of ) and (out of ).
- Derive lower and upper confidence limits for :
where is the critical value for the chosen significance level, and are the Clopper–Pearson confidence limits for each proportion.
- Continue sampling until the confidence interval for falls entirely into one of the decision regions. The choice of (and hence ) is tuned via algorithmic search (e.g., bisection) to hit the coverage probability target.
This procedure provides an exact multistage test, avoiding the conservatism and inefficiency of approximate or fixed-sample approaches.
6. Multistage Testing Plans and Computational Aspects
The construction of multistage tests follows naturally, with sequential random intervals and the inclusion principle providing the scaffolding:
- Multistage plans allocate a fraction of the total allowable error to each stage, often approaching fully sequential procedures by increasing the number of stages.
- At each stage, the test checks for the inclusion property and, if not satisfied, requests more data.
- Efficient computation is enabled by the monotonicity of the OC function and the explicit algebraic form of the confidence limits.
Resource requirements are controlled, since the sample size is always bounded by the properties of the confidence sequence; computation is dominated by the evaluation of confidence limits and, in multiple hypothesis settings, the inclusion property.
7. Summary and Implications
Sequential testing with confidence sequences, as presented in this methodology, achieves several key objectives:
- Enables sequential and multistage hypothesis testing with uniformly controlled Type I/II error rates.
- Delivers bounded sample sizes and improved average sample numbers compared to the SPRT, especially near the indifference zone.
- Is applicable to both single and multiple hypotheses, as well as estimation problems, under a unified framework.
- Provides explicit, exact procedures for classical problems such as hypothesis testing on the difference of binomial proportions, with closed-form boundaries and risk tuning via monotonicity of the OC function.
- Simplifies the implementation and practical deployment of sequential tests by reducing design complexity to checks on confidence limit inclusion and straightforward tuning by endpoint error considerations.
This framework thus constitutes a significant advance in the design and analysis of sequential testing regimes, marrying conceptual rigor with practical effectiveness and computational tractability.