Equivalent Sample Size in Bayesian Models

Updated 26 January 2026

Equivalent sample size is a measure that translates the information in Bayesian priors or posteriors into the scale of independent sample counts.
It is used in Bayesian network scoring and experimental design to calibrate prior strength, balance model complexity, and inform sensitivity analyses.
Recent estimation techniques, such as nonparametric regression methods, provide rapid and precise ESS calculations in complex, nonconjugate scenarios.

Equivalent sample size (ESS), also known as effective sample size or prior effective sample size, is a quantitative measure expressing the information content of a probability distribution—usually a Bayesian prior or posterior—in terms of the hypothetical number of independent observations that would provide equivalent information under the corresponding likelihood model. ESS plays a central role in Bayesian modeling, model selection, and experimental design, enabling clear communication and calibration of prior assumptions, especially in scenarios where external or historical information is incorporated.

1. Foundations and Definitions

ESS formalizes the translation of prior or posterior information into the scale of sample size. For a scalar parameter $\theta$ , the ESS of a prior distribution $p(\theta)$ is defined as the number $n_0$ such that the prior conveys as much Fisher-information as $n_0$ real data points under the model, i.e.,

$ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$

Here, $\mathcal{I}_F(\theta)$ is the expected Fisher-information of one new data point at $\theta$ .

In Bayesian–Dirichlet equivalent uniform (BDeu) scoring for discrete Bayesian networks, the ESS parameter $\alpha$ directly controls the Dirichlet prior’s strength: for each node $V_i$ and conditioning configuration, the Dirichlet’s “pseudo-counts” are set as $\alpha/(r_i q_i)$ , where $p(\theta)$ 0 is the variable’s arity and $p(\theta)$ 1 is the number of parent configurations (Silander et al., 2012).

ESS also emerges in study design and evaluation, quantifying how prior or external data alters precision and power calculations, e.g., by allowing formal sample size reduction in clinical or equivalence trials (Neuenschwander et al., 2019).

2. ESS in Bayesian Network Scoring

In Bayesian network structure learning, particularly under the BDeu marginal likelihood, ESS is parametrized as the scalar $p(\theta)$ 2. The BDeu score for network $p(\theta)$ 3 given data $p(\theta)$ 4 is

$p(\theta)$ 5

with

$p(\theta)$ 6

where $p(\theta)$ 7 counts data configurations as described in (Silander et al., 2012).

The parameter $p(\theta)$ 8 tunes the trade-off between data likelihood and prior-induced structure penalty. Larger $p(\theta)$ 9 reduces the complexity penalty and pushes the maximum a posteriori (MAP) structure toward denser graphs, while smaller $n_0$ 0 heavily penalizes additional edges unless the data splits are highly informative.

Empirically, even small changes in $n_0$ 1 can lead to different MAP structures. Across benchmark datasets, the number of arcs in the optimal network—and thus the encoded independence structure—varies substantially as $n_0$ 2 is tuned, often with each plausible value within a broad range producing a distinct model (Silander et al., 2012).

3. ESS Calculation and Methodological Variants

Several approaches to quantifying ESS have been developed, especially for general Bayesian models where the prior is not conjugate:

Variance-ratio (VR) ESS: $n_0$ 3
Precision-ratio (PR) ESS: $n_0$ 4
Morita-Thall-Müller (MTM) ESS: Based on local information at the prior mean, $n_0$ 5
Expected local-information-ratio (ELIR) ESS: $n_0$ 6 where $n_0$ 7

In one-parameter exponential families, most definitions agree, but in nonconjugate models these methods can diverge substantially due to global/local information mismatch, parameter dependence in curvature, or differences between mean and mode anchoring (Neuenschwander et al., 2019).

The ELIR definition uniquely satisfies predictive consistency, i.e., after observing $n_0$ 8 new data points, the expected posterior ESS increases by $n_0$ 9: $n_0$ 0 (Neuenschwander et al., 2019). This property makes ELIR a rigorous metric for both prior and posterior ESS.

4. ESS Estimation Techniques

ESS can be analytically computed in simple, conjugate cases, but more generally requires estimation, especially for complex or nonstandard priors.

In Gaussian approximation methods for expected value of sample information (EVSI), the prior on parameter $n_0$ 1 is taken as approximately normal with variance $n_0$ 2. Then, $n_0$ 3. More generally, variance-ratio and regression-based estimators are used:

$n_0$ 4

$n_0$ 5

where $n_0$ 6 is the simulated future sample size (Li et al., 2024).

A notable recent advance is nonparametric regression-based ESS estimation. Simulated draws $n_0$ 7 are generated, a regression model $n_0$ 8 is fit (with $n_0$ 9 a sufficient statistic or informative summary), and ESS is estimated by the variance ratio of $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 0 to fitted values $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 1. This approach matches analytic results to within 1% in classic cases, and significantly reduces computation time compared to MCMC-based methods (Li et al., 2024).

Assumptions include well-approximated normality and informativeness of the summary $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 2. The method is primarily restricted to univariate $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 3; generalizations to multivariate settings and more flexible summaries are areas of ongoing research.

5. Sensitivity and Interpretation of ESS

The information encoded by ESS is highly sensitive to methodological details. In Bayesian network learning, small $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 4 variations can flip individual arcs or, more generally, produce markedly different independence structures. Empirical findings on dozens of datasets confirm that the number of distinct MAP structures realized across a moderate $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 5 interval (e.g., $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 6) can be large, justifying the view that ESS is not only a quantifier of prior strength but also a determinant of model complexity (Silander et al., 2012).

Because ESS influences the trade-off between fit and complexity, arbitrary selection (e.g., always setting $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 7) is discouraged. Principled strategies involve:

Integrating over $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 8 with a prior (e.g., uniform on $ESS = n_0 \quad \text{where} \quad \mathbb{E}_{\theta} \left[ \mathcal{I}_F(\theta) \right] n_0 = \text{information in } p(\theta)$ 9), maximizing the $\mathcal{I}_F(\theta)$ 0-integrated posterior for the MAP model, or
Choosing $\mathcal{I}_F(\theta)$ 1 to maximize the marginal likelihood jointly (prequential approach)

In all methods evaluated, the optimal $\mathcal{I}_F(\theta)$ 2 typically lies near the average variable arity in the Bayesian network, and integrating out $\mathcal{I}_F(\theta)$ 3 or maximizing likelihood leads to consonant model choices (Silander et al., 2012).

6. Applications in Design, Evaluation, and Meta-Analysis

In trial design, prior ESS quantifies the “borrowed” information from historical or external controls. Knowledge of $\mathcal{I}_F(\theta)$ 4 can be used to:

Reduce internal sample size without loss of information,
Calibrate type I/II error rates and power, and
Evaluate the contribution of hierarchical or mixture priors in predictive subgroup analyses (Neuenschwander et al., 2019).

In hierarchical analyses, posterior ESS can also be computed for each subgroup, capturing differential borrowing and informing the degree of shrinkage (Neuenschwander et al., 2019).

In decision analysis, particularly value-of-information calculations using EVSI, direct estimation of prior ESS enables tractable and accurate estimation of informational value under complex uncertainty and dependence structures (Li et al., 2024).

7. Practical Considerations and Recommendations

Avoid arbitrary or “conventional’’ sets for ESS parameters in Bayesian network learning. Instead, perform a sensitivity analysis over plausible ranges or apply model selection protocols integrating over ESS (Silander et al., 2012).
For nonconjugate Bayesian analyses, use the ELIR definition for prior and posterior ESS due to its unique property of predictive consistency.
When estimating ESS computationally, nonparametric regression-based methods offer state-of-the-art accuracy at high computational efficiency, provided their assumptions are met (Li et al., 2024).
In trial designs, explicitly account for ESS when combining external and internal data, as this directly impacts required follow-up, power, and interpretation (Neuenschwander et al., 2019).
For vector-valued parameters or hierarchical models, compute ESS for each marginal or adapt multivariate information-theoretic approaches; joint ESS definitions remain an open research topic (Li et al., 2024).

Equivalent sample size thus serves as a unifying currency for prior and posterior information, central to both Bayesian inference and principled experimental design. Its practical use requires rigorous definition, careful estimation, and nuanced interpretation in context-specific modeling frameworks.