Parametric Bayesian TOST
- Parametric Bayesian TOST is an equivalence testing method that integrates Bayesian posterior inference with the classical TOST framework to assess practical equivalence.
- It replaces frequentist p-values with posterior tail probabilities derived from continuous priors, ensuring valid uniformity under the null hypothesis.
- The approach involves specifying equivalence bounds, computing posterior probabilities, and balancing type I error control with test power across parametric models like normal and binomial.
Parametric Bayesian Two One-Sided Tests (TOST) integrate Bayesian posterior inference into the classical TOST procedure for equivalence testing, replacing frequentist -values by posterior tail probabilities. This allows conclusions about practical equivalence (rather than simple difference) to be drawn via Bayesian measures of evidence, with posterior probabilities that are valid uniform test statistics under the null, supporting both single-hypothesis and multiple-testing contexts. The methodology offers direct control over the tradeoff between type I error, test power, and prior informativeness, and is applicable to a range of parametric models including normal and binomial families (Ochieng, 25 Jul 2025, Meyer et al., 2021).
1. Parametric Model Structure and Equivalence Hypothesis Formulation
The parametric Bayesian TOST operates on data sampled i.i.d. from a family . The null hypothesis for equivalence is split across two margins :
- :
- :
This is further decomposed into two one-sided nulls for TOST:
- : versus :
- : versus :
A statistic with the monotone likelihood-ratio property in underpins the construction, supporting the derivation of pivotal quantities (Ochieng, 25 Jul 2025). This setting encompasses scenarios such as comparing two means, two proportions, and equivalence of a mean to a reference value (Meyer et al., 2021).
2. Bayesian Posterior Tail Probabilities and Decision Rules
A continuous prior is placed on , yielding posterior via
The Bayesian analogs of one-sided -values are the posterior left and right tail probabilities:
The overall Bayesian “p-value” for equivalence is
Equivalence is declared when both and for chosen (often ), precisely parallel to the frequentist TOST procedure (Ochieng, 25 Jul 2025).
3. Uniformity and Validity of Bayesian TOST Statistics
For monotone likelihood-ratio families and continuous priors, each tail posterior probability is uniformly distributed under its null:
- is uniform when
- is uniform when
As a result, and serve as valid -values for multiple-testing procedures and FDR contexts. This is foundational to their direct use in the TOST structure and enables plug-in to standard p-value-based procedures, including the Benjamini–Hochberg algorithm (Ochieng, 25 Jul 2025).
4. Algorithmic Workflow and Implementation
The Bayesian TOST involves the following steps:
- Specification: Define equivalence bounds and the significance level .
- Prior Selection: Choose prior over (e.g., for normal models, Beta for binomial cases).
- Posterior Calculation: Compute given observed data .
- Tail Probability Evaluation:
- Decision Rule: Declare equivalence if and .
The same steps apply in discrete models (e.g., binomial data), replacing integrals with summations as appropriate (Ochieng, 25 Jul 2025). For explicit “two-interval” Bayesian tests (2IT), one computes posterior probability and applies high threshold criteria (e.g., ), paralleling but not identical to the TOST tail-probability approach (Meyer et al., 2021).
5. Power, Conservativeness, and Prior Specification
The power function for the Bayesian TOST depends on prior choice:
- Binomial-Beta Models: Priors with yield less conservative posterior p-values; as increase, conservativeness increases and power drops.
- Normal Models: For prior , recovers the frequentist case (). Very small gives an overly informative, extremely conservative test. Choosing moderate (by empirical Bayes or elicitation) balances the trade-off.
Closed-form expressions for and allow explicit comparison. For instance, in the normal-mean model with known ,
Power (, ) is computed by integrating these under the sampling law. Often, suitably chosen priors achieve greater power near the center of the equivalence region (Ochieng, 25 Jul 2025).
6. Correlation Structure and Multiple Testing
In the normal model, it is shown that (Proposition 8 of (Ochieng, 25 Jul 2025)). This independence ensures separate inferential roles for the Bayesian and frequentist procedures and supports valid FDR procedures.
Simulations for both single-hypothesis and multiple-testing (up to hypotheses) demonstrate that
- Type I error control is near the nominal level,
- Power increases with sample size, relaxed equivalence margins, and prior variance,
- Bayesian posterior p-value–based FDR is competitive with standard -value FDR as prior variance grows.
Under dependence or in FDR settings, collections of values can be submitted directly to algorithms such as Benjamini–Hochberg, as they retain the uniform null distribution (Ochieng, 25 Jul 2025).
7. Extensions, Comparison, and Practical Considerations
The parametric Bayesian TOST generalizes to the Bayesian two-interval test (2IT) framework, which replaces p-values with posterior probabilities of interval hypotheses, applicable to superiority, non-inferiority, or equivalence (Meyer et al., 2021). The Bayesian TOST can realize direct sample size determination via expected posterior probabilities and permits sequential and optional stopping without adjusting type I error, thanks to the likelihood principle.
Differences between Bayesian TOST and fully posterior-interval tests are detailed in their respective treatments: the TOST approach employs tail probabilities as pivotal quantities, matching the frequentist conceptual structure; the 2IT computes posterior mass in the equivalence interval versus its complement and adopts threshold-based decision rules that may directly represent the probability of equivalence.
Across approaches, by tuning the informativeness of the prior, practitioners can interpolate between fully uninformative, classical TOST-like behavior and potentially more powerful, informative analyses when robust prior knowledge is available. The methodology encompasses a wide range of standard parametric models and is computationally straightforward in settings with conjugate priors.
References:
- "A Comparison of the Bayesian Posterior Probability and the Frequentist -Value in Testing Equivalence Hypotheses" (Ochieng, 25 Jul 2025).
- "Bayesian two-interval test" (Meyer et al., 2021).