P-Value ESS Estimation Method

Updated 27 July 2025

P-Value ESS Estimation Method is a technique that quantifies the effective sample size of prior information by linking frequentist p-values with Bayesian posterior probabilities.
It employs simulation-based analyses to assess prior-data concordance, highlighting both beneficial and detrimental prior effects in a single, interpretable metric.
The method is applicable to diverse scenarios including one-sample, two-sample, regression, and genomic studies, enhancing hypothesis testing and experimental design.

The p-value Effective Sample Size (ESS) estimation method is a principled approach for quantifying the informational contribution of prior distributions relative to observed data within the context of hypothesis testing. By leveraging the mathematical relationship between frequentist p-values and Bayesian posterior probabilities of the null hypothesis, this method yields an interpretable ESS metric that can be positive, zero, or negative depending on the concordance between prior and data. The framework accommodates diverse statistical settings, requires no subjective baseline prior, and reflects the global effect of multiple priors simultaneously. It is applicable to a range of problems including one-sample and two-sample inference, regression models, and genomic applications.

1. Fundamental Methodology

The core of the p-value ESS estimation method is the formal equivalence between the p-value of a frequentist hypothesis test and the posterior probability of the null hypothesis under a noninformative prior. This equivalence was established in the asymptotic regime and has been extended by subsequent works.

The methodology proceeds as follows:

Frequentist Reference: For a given sample and hypothesis test (e.g., $H_0: \mu = 0$ ), compute the standard frequentist test statistic, such as $Z_n^F = \sqrt{n}\bar{X}/\sigma$ , where $n$ is the observed sample size.
Bayesian Posterior: Under an informative prior (e.g., $\mu \sim \mathcal{N}(\delta, \sigma^2/m)$ ), compute the posterior for $\mu$ and derive the posterior Z-statistic: $Z_n^B = \tilde{\mu}/\tilde{\sigma}$ , where $\tilde{\mu}$ and $\tilde{\sigma}$ are the posterior mean and standard deviation, respectively.
ESS Formalization: Define $U_n^B = \mathbb{E}[(Z_n^B)^2]$ (posterior test statistic variance) and $U_{\tilde{n}}^F = \mathbb{E}[(Z_{\tilde{n}}^F)^2]$ for a hypothetical frequentist sample size $\tilde{n}$ . The ESS is the value of $\delta n = n - \tilde{n}^*$ , where $\tilde{n}^*$ minimizes $|U_n^B - U_{\tilde{n}}^F|$ .

This procedure generalizes to binomial/Beta, two-sample, and regression models using the analogous forms of test statistics and matching procedures. The approach does not require specification of a baseline prior, instead grounding the comparison in the natural neutrality of a noninformative prior by construction.

2. Innovations and Interpretive Scope

This method departs from conventional ESS frameworks in several pivotal ways:

Prior-Likelihood Discordance: Unlike classical ESS estimators, which typically assume the prior is always beneficial, p-value ESS estimation admits the possibility of negative ESS. This arises when the prior is substantively in conflict with the observed data, quantifying the prior as "detrimental," i.e., reducing the power or reliability of inference.
Agnostic to Baseline Priors: The dependence on baseline or reference priors (as seen in methods such as the variance-ratio, MTM, or AMSE approaches) is eliminated. The p-value/posterior probability equivalence naturally centers the calculation without subjective specification.
Collective ESS for Multiple Priors: In multi-parameter settings, such as multiple regression or comparison of multiple proportions, the method produces a single ESS quantifying the aggregate informational content provided by all involved priors—a "one-ESS–multiple-priors" paradigm.

These features extend the interpretive range of ESS beyond the classical "one prior, one number" model, enabling nuanced measurement of complex, context-dependent prior-data interactions.

3. Simulation-Based Empirical Analysis

The method's properties were systematically explored via simulation across canonical one-sample and two-sample settings, as well as regression and generalized linear model contexts. The simulations manipulated both intensity (e.g., prior variance magnitude) and deviation (difference between prior mean and true parameter value) for a variety of prior types.

Key empirical observations include:

When the prior is closely aligned with the likelihood (zero or negligible deviation), the ESS grows approximately linearly with increasing prior "intensity" (e.g., $m$ in a normal prior or $a+b$ in a Beta prior).
As the degree of mis-specification rises, the ESS diminishes, vanishing entirely or becoming negative for strongly discordant priors.
In contrasts with existing estimators (e.g., Reimherr, Morita), the p-value ESS estimator demonstrated more pronounced sensitivity to prior-likelihood misalignment, discriminating between mildly and severely deleterious priors.
The method seamlessly handles global effects from multiple priors, parsimoniously synthesizing them into a single ESS value reflecting their collaborative (or antagonistic) impact.

This sensitivity enables the estimator to robustly signal to analysts the occasions when informative priors actually erode rather than buttress inferential power.

4. Application in Genomic and Biomedical Studies

The method's flexibility was illustrated in genomic association studies, specifically eQTL analysis. In such studies, regression of gene expression (quantitative response) on sequence variant genotype (predictor) is informed by both current sample data and external prior information (e.g., from large reference datasets such as Braineac).

Significant findings include:

Informative priors well-aligned with external datasets led to substantial increases in ESS—occasionally boosting effective sample size by the equivalent of over a hundred additional observations.
This translated to substantially enhanced statistical power for eQTL discovery, as evidenced by increased Z-statistics and more confident assignment of association status.
Even moderate calibration error (e.g., underestimated effect magnitude) still conferred sizable, though attenuated, ESS benefit.
When priors were discordant with observed data, the ESS estimate became negative, correctly signaling potentially misleading prior information and effectively penalizing overconfident or incorrect prior beliefs.

This demonstrates the method's practical value in high-dimensional, data-rich biomedical settings, ensuring that prior information is integrated transparently, adaptively, and with well-calibrated skepticism.

5. Broader Implications and Use in Decision Design

The p-value ESS estimation method has important consequences for experimental design, hypothesis testing, and adaptive trial strategy:

Trial Design and Analysis: By quantifying the net information contributed by prior distributions, both beneficial and harmful, the approach allows designers to rationally adjust sample sizes, prior weights, and stopping rules. This is particularly relevant in early-phase clinical trials and information borrowing across studies.
Warning Against Harmful Priors: The possibility of negative ESS serves as a rigorous diagnostic for prior-data conflict, discouraging uncritical use of strong but potentially misinformed priors.
Reduction in Subjectivity: The method's avoidance of baseline prior specification increases analysis objectivity and reproducibility.
Efficiency in Complex, Multi-source Settings: The framework's ability to summarize the global informational contribution of several priors in high-dimensional or multi-arm studies makes it suitable for complex modern analyses (e.g., multi-locus association mapping, multi-center trials).

In sum, this approach enables statistically principled, operationally efficient, and highly interpretable integration of prior information into hypothesis-testing workflows.

6. Positioning Within the ESS Methodology Landscape

Compared to previous ESS estimation strategies—including those based on Fisher information, variance ratios, local information ratios, or predictive consistency criteria—the p-value ESS estimator is directly tied to the operational power of a standard significance test, as measured by expected squared Z-statistics. Its ability to reflect both harmful (negative ESS) and beneficial (positive ESS) priors, its baseline-agnostic nature, and its parsimonious treatment of multiple priors distinguish it from variance-ratio (Neuenschwander et al., 2019), information-ratio, and simulation-based alternatives (Martino et al., 2016). These features make the approach particularly well-suited to hypothesis testing contexts and high-dimensional modern data analysis.

7. Summary Table: Key Features of the P-Value ESS Estimation Method

Feature	P-Value ESS Estimation	Conventional ESS Estimation
Negative ESS values allowed	Yes	Typically no
Requires baseline prior	No	Yes (in many methods)
Handles prior-likelihood conflict	Yes	No/Partial
Single ESS for multiple priors	Yes	No (usually one-per-prior)
Direct link to testing framework	Yes (via p-value/posterior link)	Often indirect

The p-value ESS estimation method thus constitutes a theoretically grounded and practically robust approach for synthesizing the influence of prior information within frequentist hypothesis testing and decision-making procedures.