Papers
Topics
Authors
Recent
2000 character limit reached

Jeffreys–Lindley Paradox: Bayesian vs Frequentist

Updated 1 December 2025
  • The Jeffreys–Lindley paradox is a statistical phenomenon where Bayesian analysis increasingly favors the point null hypothesis as sample size grows, despite frequentist rejection via p-values.
  • It highlights the divergence between methodologies by contrasting how p-values remain fixed at significance thresholds while Bayes factors tend to infinity under a constant prior setup.
  • The paradox urges researchers to adopt interval nulls and calibrated priors, aligning statistical testing more closely with practical relevance and effect size interpretation.

The Jeffreys–Lindley paradox is a central result in statistical theory illustrating a fundamental asymptotic divergence between frequentist hypothesis test conclusions and Bayesian posterior inference, specifically in the context of point null versus composite alternative hypothesis testing. Emerging from the contrasting behaviors of pp-values and Bayes factors as the sample size increases while the significance level and prior structure are held fixed, the paradox raises critical questions about practical and philosophical interpretations of "evidence" in statistical inference frameworks. Its implications span theoretical statistics, experimental design, and the foundational debate between Bayesian and frequentist methodologies.

1. Precise Statement and Mathematical Formulation

Consider the testing of a simple point null hypothesis against a composite alternative under a normal location model: H0:θ=θ0vs.H1:θθ0,H_0: \theta = \theta_0 \quad \text{vs.} \quad H_1: \theta \neq \theta_0, with observations X1,,XnX_1,\dots,X_n independently drawn from N(θ,σ2)N(\theta,\sigma^2), σ2\sigma^2 known. The frequentist test utilizes the statistic

Xˉ=1ni=1nXi,Z=Xˉθ0σ/n,\bar X = \frac{1}{n}\sum_{i=1}^{n} X_i, \qquad Z = \frac{\bar X - \theta_0}{\sigma/\sqrt n},

and rejects H0H_0 at level α\alpha if Z>zα/2|Z| > z_{\alpha/2}. The two-sided pp-value is given by p=2[1Φ(Z)]p = 2[1 - \Phi(|Z|)]. In a Bayesian framework, prior mass π0\pi_0 is placed on θ0\theta_0 (Dirac delta), while under H1H_1, a diffuse prior such as N(θ0,τ2)N(\theta_0, \tau^2) is employed: π(θ)=π0δθ0+(1π0)N(θθ0,τ2).\pi(\theta) = \pi_0\,\delta_{\theta_0} + (1-\pi_0) N(\theta \mid \theta_0, \tau^2). The Bayes factor for H0H_0 versus H1H_1 is then

B01=m0(Xˉ)m1(Xˉ),B_{01} = \frac{m_0(\bar X)}{m_1(\bar X)},

where m0(Xˉ)m_0(\bar X), m1(Xˉ)m_1(\bar X) denote the marginal likelihoods under H0H_0 and H1H_1 respectively.

The paradox manifests when, as nn \to \infty at fixed α\alpha and fixed prior parameters, the pp-value remains at the threshold α\alpha given "just significant" data, while the Bayes factor B01B_{01} \to \infty and thus the Bayesian posterior probability of H0H_0 tends to 1. Explicitly, for data with Xˉθ0=zα/2σ/n|\bar X - \theta_0| = z_{\alpha/2} \sigma/\sqrt n,

B01asn,B_{01} \to \infty \quad \text{as} \quad n \to \infty,

indicating ever-increasing Bayesian support for H0H_0 despite the frequentist criterion continuously rejecting H0H_0 (Lovric, 28 Nov 2025, Wijayatunga, 18 Mar 2025, Cousins, 2013).

2. Distinction from Bartlett’s Anomaly and Conceptual Clarifications

A persistent misconception has conflated the Jeffreys–Lindley paradox with what is properly termed Bartlett’s anomaly, wherein the prior variance τ2\tau^2 under H1H_1 diverges (τ2\tau^2 \to \infty) at fixed sample size. Both phenomena result in B01B_{01} \to \infty under different asymptotic regimes: the Jeffreys–Lindley paradox is driven by nn \to \infty at fixed τ2\tau^2, while Bartlett’s anomaly is driven by τ2\tau^2 \to \infty at fixed nn. These situations possess distinct mathematical structures and implications, and require separate resolutions (Lovric, 28 Nov 2025).

3. Mechanistic Origins and Statistical versus Practical Significance

At its core, the paradox is a consequence of tension between statistical and practical significance. The frequentist method assesses the observed gap Xˉθ0|\bar X - \theta_0| relative to the rapidly decreasing standard error, leading to "statistical significance" for arbitrarily small deviations as nn increases. In contrast, the Bayesian framework, penalizing the alternative for spreading prior mass over a large parameter space, increasingly favors the point null as data accumulates near θ0\theta_0—a property termed the "Ockham’s razor" effect (Wijayatunga, 18 Mar 2025, Cousins, 2013). This is further compounded under large nn by the fact that even negligible differences become significant under the frequentist protocol, whereas the Bayesian Bayes factor continues to reward parsimony unless the observed effect is substantial relative to region covered by the alternative prior.

4. Extensions and Implications in Testing and Estimation

The paradox critically impacts both hypothesis testing and interval estimation. With positive prior mass allocated to a point null, the Bayesian posterior can concentrate so strongly on that point as nn \to \infty (for fixed Bayes factor or posterior odds target), that credible intervals constructed from the posterior mixture distribution may become undefined for certain credibility levels. This "incredibility gap" means that for some values of α\alpha, no central credible interval exists—a phenomenon exclusive to Bayesian procedures with point-mass mixture posteriors, not mirrored in frequentist confidence intervals (Campbell et al., 2022).

The table illustrates credible interval definability:

Posterior Model Mixture Posterior with Point Mass Purely Continuous Posterior
Interval Issue Credibility gap may arise Credible intervals always exist

Frequentist confidence intervals, by contrast, retain well-defined coverage properties for all α\alpha, emphasizing fundamental inferential discrepancies in the presence of point-mass priors.

5. Resolutions: Interval Nulls and Alternative Bayesian Calibrations

A central theme in current research is that the only principled resolution to the Jeffreys–Lindley paradox is the reformulation of the hypothesis from a point null H0:θ=θ0H_0: \theta = \theta_0 to an interval (or "practical equivalence") null H0:θθ0δH_0: |\theta - \theta_0| \leq \delta where δ>0\delta > 0 captures the minimum effect size of scientific interest (Lovric, 28 Nov 2025, Wijayatunga, 18 Mar 2025). With interval nulls, both frequentist and Bayesian evidence accumulation are commensurate: frequentist equivalence testing uses confidence interval overlap with [θ0δ,θ0+δ][\theta_0 - \delta, \theta_0 + \delta], and the Bayesian approach compares continuous prior probabilities over the interval null and its complement. In this setting, the paradox disappears; both approaches cohere and reflect practical significance rather than a measure-zero point null.

Alternative calibrations have been proposed:

  • Use of finite, scientifically motivated priors for τ2\tau^2 to control the size of Bayes factor penalties (Villa et al., 2015),
  • Cake priors that diffuse at rates matched to the number of parameters, leading to automatic BIC-like penalties and Chernoff consistency (asymptotically zero type I and II errors) (Ormerod et al., 2017),
  • Predictive model selection (AIC- or cross-validation-based criteria) in place of postdictive Bayes factors to maintain detection resolution and avoid the paradox for large nn (LaMont et al., 2016).

6. Broader Impact, Domain-Specific Manifestations, and Remaining Issues

The paradox has significant practical implications in fields where large samples are common and sharp nulls are tested, including high energy physics and precision metrology. For example, in particle physics, established practice often requires 5σ5\sigma significance for discovery declarations, but as the paradox demonstrates, fixed thresholds on the pp-value or zz-score can be at odds with Bayesian conclusions as sample sizes become enormous and systematic uncertainty dominates inference (Cousins, 2013). Similar discordance has been documented in phase estimation with optical interferometry: Bayesian conclusions may depend strongly on prior width, while the frequentist test signals almost certain "discovery," revealing the extent to which experimental context and scientific prior knowledge must be incorporated to avert misleading inference (Mauri et al., 2015).

Even with "objective" or diffuse priors justified by lack of knowledge, the paradox reveals a mathematical impossibility: truly nn-indifferent inference requires improper (scale-invariant) priors that cannot be normalized, so any proper prior with truncation inevitably introduces nn-dependence and can only delay, not remove, the paradox (Fowlie, 2020).

7. Summary of Recommendations and Theoretical Insights

Established recommendations for practitioners seeking to avoid the Jeffreys–Lindley paradox include:

  1. Replace point nulls with scientifically meaningful interval nulls whenever possible, aligning statistical testing with practical relevance (Lovric, 28 Nov 2025, Wijayatunga, 18 Mar 2025).
  2. Employ finite, problem-specific priors for alternatives to prevent automatic dominance by H0H_0, calibrating τ2\tau^2 or prior mass to practical effect sizes (Villa et al., 2015, Mauri et al., 2015).
  3. Use predictive or cross-validation criteria rather than pure marginal likelihoods or postdictive Bayes factors in scenarios where maximizing detection resolution and minimizing paradoxical inconsistencies is paramount (LaMont et al., 2016).
  4. Exercise caution in interpreting model-averaged credible intervals with point-mass priors, as standard interval procedures may become undefined for large nn (Campbell et al., 2022).
  5. Conduct sensitivity analyses with respect to prior distribution, width, and truncation points, especially when the goal is to align statistical with substantive scientific significance.

The Jeffreys–Lindley paradox highlights the necessity of integrating practical effect sizes into hypothesis testing, the limitations of uncritical use of pp-values or Bayes factors for point nulls in high-dimensional or large-sample regimes, and the conceptual need for careful prior specification in Bayesian model comparison. These findings reinforce the need for scientific context and problem-specific calibration in modern statistical inference (Lovric, 28 Nov 2025, Wijayatunga, 18 Mar 2025, Cousins, 2013, Campbell et al., 2022, Mauri et al., 2015, LaMont et al., 2016, Ormerod et al., 2017, Villa et al., 2015, Fowlie, 2020).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Jeffreys-Lindley Paradox.