Nonparametric Bayesian TOST
- Nonparametric Bayesian TOST is a method that extends equivalence testing by incorporating Bayesian nonparametric models to assess negligible differences.
- It leverages flexible priors such as Dirichlet process mixtures and MCMC sampling to allow robust inference without fixed parametric assumptions.
- The PROTEST framework operationalizes this approach, ensuring consistency and improved control over type I error in equivalence testing.
Nonparametric Bayesian TOST (Two One-Sided Tests) extends the established methodology of equivalence testing from the parametric to the fully nonparametric Bayesian regime, enabling statistical inference on hypotheses about negligible differences without fixed distributional assumptions. The PROTEST framework operationalizes this extension, providing an accessible, MCMC-based nonparametric approach that parallels the logic of classical TOST procedures by assessing posterior mass within a tolerance region around the null value (Lassance et al., 8 Mar 2024).
1. Conceptual Foundation: Enlarged Null and TOST Analogue
Classical TOST procedures test equivalence via two one-sided tests corresponding to whether a parameter lies outside a given interval around a reference value , with width determined by a practical tolerance . Formally, the enlarged or pragmatic null hypothesis is defined as
as opposed to the point null . The TOST logic requires rejection of both and ; equivalently, it declares equivalence if the confidence interval falls entirely within .
In a Bayesian formulation, the interval-in-CI criterion is replaced with evaluation of the posterior probability that lies within the interval , specifically computing and declaring equivalence if this posterior mass exceeds . This aligns the Bayesian decision rule directly with the TOST logic (Lassance et al., 8 Mar 2024).
2. Nonparametric Bayesian Model Structure
The nonparametric Bayesian approach instantiates this expanded equivalence logic without parametric restrictions by modeling data distributions through flexible priors such as Dirichlet process mixtures, Pólya tree priors, or Gaussian process priors. In the two-sample setting, suppose and are two samples:
- Likelihood: , .
- Nonparametric Priors: Common examples include:
- Independent Dirichlet process mixtures:
with analogous specification for . - Dependent Dirichlet processes for paired samples, or Gaussian process (GP) priors over densities.
Posterior Sampling: Posterior draws are obtained via standard DP mixture MCMC algorithms (e.g., Chinese-restaurant, Pólya–urn, stick-breaking truncated Gibbs samplers).
3. Posterior Probability of Equivalence
Equivalence for distributions is operationalized via a distance function . Common choices include:
Kolmogorov–Smirnov-style: .
Classifier-based:
The enlarged null is then
For each set of posterior draws, , the estimated posterior mass in the enlarged null is
This operationalizes the Bayesian equivalence assessment fully nonparametrically (Lassance et al., 8 Mar 2024).
4. Decision Criterion and Consistency Properties
The equivalence decision follows directly:
Select a level .
Declare equivalence if (i.e., high posterior mass falls within the equivalence region).
Otherwise, withhold equivalence.
An equivalent statement is to reject the enlarged null if . This mirrors the TOST approach’s demand for confidence that the parameter is sufficiently close under the posterior.
PROTEST yields consistency: if the true distributions differ by less than , then for , the posterior mass inside converges to one, ensuring the procedure declares equivalence. If the difference exceeds , the posterior mass avoids the -ball and equivalence will not be declared, due to the Bernstein–von Mises phenomenon. In simulation, classical PTtest [Holmes & Walker, 2015] with Pólya–tree priors often over-rejects at large , even when true differences are negligible, while PROTEST's criterion is stable (Lassance et al., 8 Mar 2024).
5. Selection of Tolerance ε
Determining is central. PROTEST outlines two main strategies:
Direct elicitation:
- Theory or measurement-error bound: set equal to known measurement error .
- Prior-mass calibration: Select a small (e.g., ), and pick such that the prior probability of is .
- Reference study: Choose as the smallest value that would have declared equivalence on a key reference dataset at level .
- Sensitivity or bounding:
- Analyze results across candidate tolerances from multiple experts.
- Report posterior mass for each , and illustrate the boundary to contextualize robustness with respect to the choice of tolerance.
6. Implementation Workflow
An explicit workflow for the two-sample PROTEST test is as follows:
| Step | Description | Operational Detail |
|---|---|---|
| 1 | Run MCMC to generate | Use DP mixture or other NP prior |
| 2 | Compute | Choice of as specified |
| 3 | Posterior mass | Empirical proportion |
| 4 | Declare equivalence if | Output: , equivalence result |
In practice, standard DP mixture samplers are employed, as implemented in tools such as the R package protest (GitHub: rflassance/protest) (Lassance et al., 8 Mar 2024).
7. Comparison with Existing Nonparametric Approaches
Holmes & Walker's PTtest computes a tail-area metric using a Pólya–tree prior but bases decisions on a classical -ball; this method tends to reject equivalence as sample size grows, regardless of practical difference. In contrast, PROTEST employs a posterior-mass-in-interval criterion, remaining insensitive to overfitting of the posterior and better aligned with pragmatic thresholds. Empirical studies, including a normal vs. simulated example, illustrate that PTtest frequently over-rejects at high sample size, while PROTEST's behavior more closely matches practical equivalence constructs (Lassance et al., 8 Mar 2024).
A plausible implication is that PROTEST constitutes an automated and coherent Bayesian analogue of classical TOST in the nonparametric regime, with practical advantages in interpretability and robustness.
For a comprehensive presentation and additional illustrations, see "PROTEST: Nonparametric Testing of Hypotheses Enhanced by Experts' Utility Judgements" (Lassance et al., 8 Mar 2024).