Learned Harmonic Mean Estimator (LHME)
- LHME is a Bayesian evidence estimator that learns an internal importance sampling target via normalizing flows to achieve finite variance.
- It employs temperature scaling and support control to mitigate the infinite variance issues inherent in classical harmonic mean estimators.
- LHME is sampler-agnostic and works with posterior samples from MCMC or variational inference, providing accurate model comparison across diverse benchmarks.
The Learned Harmonic Mean Estimator (LHME) is a robust, scalable, and flexible technique for estimating Bayesian evidence (marginal likelihood) for model comparison. LHME resolves the infinite variance pathologies of classical harmonic mean estimators by learning a suitable internal importance sampling target distribution, typically parameterized using expressive normalizing flow architectures. The estimator requires only posterior samples, rendering it agnostic to the sampling strategy and enabling direct applicability to saved MCMC chains or variational inference outputs. LHME achieves finite variance and unbiasedness through temperature scaling and careful support control of the learned target, with demonstrated accuracy and computational efficiency across benchmarks from low-dimensional toy models to high-dimensional cosmological datasets (Polanska et al., 2024, Polanska et al., 2023, McEwen et al., 2021, Hu et al., 21 Jan 2026).
1. Bayesian Model Evidence and the Harmonic Mean Estimator
Given observed data , model parameters , likelihood , and prior , the Bayesian evidence is the normalizing constant for the posterior: This quantity determines the relative posterior probability of competing models via the Bayes factor. Direct computation of by brute-force quadrature is intractable in moderate to high-dimensional parameter spaces.
The classical harmonic mean estimator (HM): leads to the Monte Carlo estimator: and . However, when tail behavior of the posterior is thinner than the prior, importance weights become highly variable, frequently leading to infinite variance and estimator collapse (Polanska et al., 2024, Polanska et al., 2023, McEwen et al., 2021).
2. Theoretical Foundation and Derivation of LHME
A generalization introduces an arbitrary positive normalized density supported within the posterior: The empirical LHME estimator is: with evidence estimate .
The key insight is that the optimal choice yields constant importance weights and zero variance but requires access to , the quantity being estimated. LHME circumvents this by learning a surrogate density using machine learning techniques, such that and has sufficiently light tails (Polanska et al., 2024, Polanska et al., 2023, McEwen et al., 2021, Hu et al., 21 Jan 2026).
3. LHME with Normalizing Flows: Training and Tail Control
Normalizing flows parameterize using invertible, differentiable transformations composed of simple layers, e.g. Real NVP or rational-quadratic spline blocks. The flow is trained on posterior samples by minimizing the forward KL divergence: Maximum likelihood estimation is performed via stochastic gradient descent (Adam optimizer). After initial fitting, is "concentrated" by reducing the temperature of the base distribution; for flows mapping via , the induced density is: Lowering decreases tail thickness, ensuring all probability mass remains inside the posterior and delivers finite-variance importance weights (Polanska et al., 2024, Polanska et al., 2023).
4. Algorithmic Implementation
The LHME pipeline proceeds as follows:
- Posterior Sampling: Obtain samples via any MCMC or variational method.
- Data Splitting: Divide into training ( samples) and inference ( samples) sets.
- Flow Training: Initialize flow parameters ; train by minimizing the Monte Carlo negative log-likelihood over the training set.
- Tail Concentration: Select (e.g., –$0.9$) and define via temperature scaling.
- Evidence Estimation: Compute weights on the evaluation set; estimate , then .
- Error Quantification: Sample variance of yields uncertainty estimates; propagate to .
This procedure is implemented in the open-source Python package harmonic (Polanska et al., 2024, Polanska et al., 2023, McEwen et al., 2021).
5. Theoretical Properties and Diagnostics
LHME exhibits desirable theoretical properties:
- Unbiasedness: for any normalized and supported inside the posterior.
- Variance Control: Variance is finite if (equivalently, ) has support strictly inside the posterior and lighter tails.
- Consistency: As , estimator converges in probability to the true evidence under standard Monte Carlo arguments (Polanska et al., 2024, Polanska et al., 2023, Hu et al., 21 Jan 2026).
Diagnostic strategies monitor tail behavior: empirical Pareto- statistics on the weights, kurtosis of chain-wise evidence estimates, and sensitivity to splits in train/test sets. Infinite-variance warning thresholds (e.g., ) indicate unreliable outputs.
6. Empirical Performance and Applicability
Numerical experiments demonstrate LHME's empirical accuracy and robustness:
| Problem | LHME Configuration | Accuracy vs. Benchmark |
|---|---|---|
| 2D Rosenbrock | Real NVP, | Ground truth recovered; HM diverges |
| Normal–Gamma model | Spline/flow, variable | Tracks analytic evidence; robust to |
| Pima Indian logistic regression | Real NVP flow | matches RJMCMC |
| Radiata pine linear regressions | Spline-flow LHME | Recovers |
| 21D Gaussian | RQ-spline, | Unbiased over 100 repeats |
| 10D Rosenbrock | Several flows, –$0.8$ | Competitive with nested sampling, faster |
| DES Y1 cosmology (20–21d) | Metropolis+LHME, GPU/CPU | log-unit agreement; cost min vs. hrs PolyChord |
In all cases, LHME+flows remain unbiased, yield correct error bars, and do not require fine-tuning of . Benchmarks against nested sampling (PolyChord), RJMCMC, or ground-truth integration verify consistency and efficiency (Polanska et al., 2024, Polanska et al., 2023, McEwen et al., 2021, Hu et al., 21 Jan 2026).
7. Advantages, Limitations, and Potential Extensions
Advantages:
- Sampler-agnostic: Requires only posterior samples, compatible with MCMC or variational inference.
- Flexibility: Normalizing flows model complex, multimodal, highly correlated posteriors.
- Scalability: Empirical accuracy up to 21 dimensions; performance demonstrated for cosmological applications and high-dimensional Gaussians.
- Reusability: Saved posterior chains can be directly used for evidence estimation.
Limitations and Open Questions:
- Temperature Selection (): Although is robust, automated selection could further improve ease of use.
- Expressivity vs. Overfitting: High-dimensional posteriors may challenge flow expressivity and tail control.
- Computational Cost: Large flows require significant training effort if posterior sample counts are high.
Potential Extensions:
- Exploiting more advanced flow architectures (e.g., continuous-time/residual flows) for improved tail management in very high dimensions.
- Automated joint tuning of flow parameters and temperature to minimize estimator variance.
- Integration into sequential Monte Carlo or variational inference pipelines for online model comparison.
- Application to likelihood-free inference settings via ratio estimation.
LHME fundamentally decouples evidence estimation from posterior sampling, facilitating computationally efficient prior sensitivity analyses and model selection across scientific domains (Polanska et al., 2024, Polanska et al., 2023, McEwen et al., 2021, Hu et al., 21 Jan 2026).