Are you using test log-likelihood correctly? (2212.00219v4)
Abstract: Test log-likelihood is commonly used to compare different models of the same data or different approximate inference algorithms for fitting the same probabilistic model. We present simple examples demonstrating how comparisons based on test log-likelihood can contradict comparisons according to other objectives. Specifically, our examples show that (i) approximate Bayesian inference algorithms that attain higher test log-likelihoods need not also yield more accurate posterior approximations and (ii) conclusions about forecast accuracy based on test log-likelihood comparisons may not agree with conclusions based on root mean squared error.
- Misspecified mean function regression: Making good use of regression models that are wrong. Sociological Methods & Research, 43(3):422–451.
- Working with misspecified regression models. Journal of Quantitative Criminology, 34:633–655.
- Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect. Annals of Mathematical Statistics, 37(1):51–58.
- Bayesian Theory. Wiley.
- Current practices in data analysis procedures in psychology: What has changed? Frontiers in Psychology, 9:2558.
- Reading tea leaves: How humans interpret topic models. Advances in Neural Information Processing Systems, 22.
- A kernel test of goodness of fit. In Proceedings of the 33rdsuperscript33𝑟𝑑33^{rd}33 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT International Conference on Machine Learning.
- Coherent dispersion criteria for optimal experimental design. Annals of Statistics, 27(1):65–81.
- Neural variational gradient descent. In Fourth Symposium on Advances in Approximate Bayesian Inference.
- Visualization in Bayesian workflow. Journal of the Royal Statistical Society Series A, 182(2):389–402.
- Scalable Bayesian learning of recurrent neural networks for language modeling. arXiv pre-print arXiv:1611.08034.
- Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24:997–1016.
- Structured variational learning of Bayesian neural networks with horseshoe priors. In Proceedings of the 35thsuperscript35𝑡ℎ35^{th}35 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Machine Learning.
- Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102:359–378.
- Measuring sample quality with Stein’s method. Advances in Neural Information Processing Systems, 28.
- Probabilistic backpropagation for scalable learning of Bayesian neural networks. In Proceedings of the 32𝑛𝑑superscript32𝑛𝑑32^{\text{nd}}32 start_POSTSUPERSCRIPT nd end_POSTSUPERSCRIPT International Conference on Machine Learning.
- Black-box α𝛼\alphaitalic_α-divergence minimization. In Proceedings of the 33rdsuperscript33𝑟𝑑33^{rd}33 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT International Conference on Machine Learning.
- Stochastic variational inference. Journal of Machine Learning Research, 14:1303–1347.
- Validated variational inference via practical posterior error bounds. In Proceedings of the 23𝑟𝑑superscript23𝑟𝑑23^{\text{rd}}23 start_POSTSUPERSCRIPT rd end_POSTSUPERSCRIPT International Conference on Artificial Intelligence and Statistics.
- Reproducible model selection using bagged posteriors. Bayesian Analysis, 18(1):79–104.
- Subspace inference for Bayesian deep learning. In Uncertainty in Artificial Intelligence.
- What are Bayesian neural network posteriors really like? In Proceedings of the 38thsuperscript38𝑡ℎ38^{th}38 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Machine Learning.
- Lessons learned in the challenge: making predictions and scoring them. In Machine Learning Challenges Workshop, pages 95–116. Springer.
- High-order stochastic gradient thermostates for Bayesian learning of deep models. In Proceedings of the Thirtieth AAAI Conference on Artifical Intelligence.
- A kernelized Stein discrepancy for goodness-of-fit tests. In Proceedings of the 33rdsuperscript33𝑟𝑑33^{rd}33 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT International Conference on Machine Learning.
- Stein variaitonal gradient descent: A general purpose Bayesian inference algorithm. In Advances in Neural Informational Processing Systems.
- Bayesian model selection, the marginal likelihood, and generalization. In Proceedings of the 39𝑡ℎsuperscript39𝑡ℎ39^{\text{th}}39 start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT International Conference on Machine Learning.
- Structured and efficient variational deep learning with matrix Gaussian posteriors. In Proceedings of the 33rdsuperscript33𝑟𝑑33^{rd}33 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT International Conference on Machine Learning.
- MacKay, D. J. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press.
- A simple baseline for Bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems, 32.
- SLANG: Fast structured covariance approximations for Bayesian deep learning with natural gradient. In Advances in Neural Informational Processing Systems.
- Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes. In Proceedings of the 38thsuperscript38𝑡ℎ38^{th}38 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Machine Learning.
- Evaluating predictive uncertainty challenge. In Machine Learning Challenges Workshop, pages 1–27. Springer.
- Black box variational inference. In Proceedings of the 17𝑡ℎsuperscript17𝑡ℎ17^{\text{th}}17 start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT International Conference on Artificial Intelligence and Statistics.
- Robert, C. P. (1996). Intrinsic losses. Theory and Decision, 40:191–214.
- Kernel implicit variational inference. In International Conference on Learning Representations.
- Learning structured weight uncertaitny in Bayesian neural networks. In Proceedings of the 20thsuperscript20𝑡ℎ20^{th}20 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Artificial Intelligence and Statistics.
- Validating Bayesian inference algorithms with simulation-based calibration. arXiv preprint arXiv:1804.06788.
- Vowels, M. J. (2023). Misspecification and unreliable interpretations in psychology and social science. Psychological Methods, 28(3):507.
- Deterministic variational inference for robust Bayesian neural networks. In International Conference on Learning Representations.
- Quality of uncertainty quantification for Bayesian neural network inference. arXiv pre-print arXiv:1906.09686.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.