- The paper introduces a learnt harmonic mean estimator that minimizes variance when computing marginal likelihoods in Bayesian model selection.
- It reframes the estimation as an importance sampling problem by leveraging machine learning to learn an optimal sampling density.
- Empirical tests on benchmark problems show improved accuracy and efficiency compared to traditional nested sampling methods.
Machine Learning Assisted Bayesian Model Comparison: Learnt Harmonic Mean Estimator
The paper "Machine learning assisted Bayesian model comparison: learnt harmonic mean estimator" by McEwen et al. is centered around addressing the long-standing issues associated with the harmonic mean estimator in Bayesian model selection. The harmonic mean estimator, originally introduced by Newton and Raftery in 1994, quickly became infamous due to its large and often unmanageable variance when estimating the marginal likelihood, also known as Bayesian evidence. The work presents a novel variant called the learnt harmonic mean estimator, which effectively overcomes these challenges through machine learning techniques.
Summary of Contributions
The principal contribution of this work is the introduction of a learnt harmonic mean estimator as a robust method to calculate the marginal likelihood required for Bayesian model comparison. The original harmonic mean estimator was highly sensitive due to its dependence on posteriors with heavy tails, resulting in estimation instability. This paper reframes it as an importance sampling problem, re-targeting the sampling density and introducing a learning phase to approximate the optimal sampling density.
By employing machine learning to learn along an appropriate target distribution, the authors ensure that the variance of the estimator is minimized. They leverage training and evaluation sets derived from posterior samples to optimize a model fitting the posterior distribution, with constraints to ensure tail probabilities are appropriately managed. Several models, including modified Gaussian mixture models and Kernel density estimation, are considered for approximating the target distribution.
Key Results and Validation
The empirical validation of the proposed estimator is conducted through a series of benchmark problems known to be problematic for the original harmonic mean estimator. Notably, this includes the Rosenbrock and Rastrigin functions, the Normal-Gamma model, logistic regression models for the Pima Indians dataset, and the non-nested linear regression models for the Radiata pine data. These experiments illustrate that the learnt harmonic mean estimator is not only accurate but also robust across a diverse set of problematic high-dimensional scenarios. For instance, the paper demonstrates a significant improvement in accuracy for Bayes factor computation compared to previous harmonic mean estimates in complex cosmological models. The estimator is shown to compute marginal likelihoods and Bayes factors consistent with those obtained by nested sampling methods, but more efficiently in terms of both computational time and required samples.
Implications and Future Prospects
The implications of these findings are notable for Bayesian inference and broader scientific domains employing model comparisons. The decoupling of the sampling strategy from the marginal likelihood computation offered by the learnt harmonic mean estimator is particularly advantageous, allowing the use of posterior samples produced via any advanced sampling technique without necessitating changes to the estimator itself. This flexibility permits rapid adaptability across varying domains with complex model requirements, simplifying implementations for simulation-based inferences.
Future directions may explore the extension of the method to even higher dimensions and more intricate models, strengthening the estimator’s applicability. Additionally, there is potential for improved training strategies within the machine learning component to further reduce estimation variance. Exploration into alternative models for the target distribution can also enhance the estimator's performance in models exhibiting complex posterior landscapes with intricate degeneracies.
Overall, the learnt harmonic mean estimator represents a significant advancement in Bayesian model selection, providing an efficient and reliable method for estimating the marginal likelihood in a way that is scalable to practical applications in high-dimensional settings. The publicly available harmonic software package ensures reproducibility and encourages further development and adaptation in various research endeavors.