Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 80 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 117 tok/s Pro
Kimi K2 176 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

Are you using test log-likelihood correctly? (2212.00219v4)

Published 1 Dec 2022 in stat.ML, cs.LG, and stat.OT

Abstract: Test log-likelihood is commonly used to compare different models of the same data or different approximate inference algorithms for fitting the same probabilistic model. We present simple examples demonstrating how comparisons based on test log-likelihood can contradict comparisons according to other objectives. Specifically, our examples show that (i) approximate Bayesian inference algorithms that attain higher test log-likelihoods need not also yield more accurate posterior approximations and (ii) conclusions about forecast accuracy based on test log-likelihood comparisons may not agree with conclusions based on root mean squared error.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Misspecified mean function regression: Making good use of regression models that are wrong. Sociological Methods & Research, 43(3):422–451.
  2. Working with misspecified regression models. Journal of Quantitative Criminology, 34:633–655.
  3. Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect. Annals of Mathematical Statistics, 37(1):51–58.
  4. Bayesian Theory. Wiley.
  5. Current practices in data analysis procedures in psychology: What has changed? Frontiers in Psychology, 9:2558.
  6. Reading tea leaves: How humans interpret topic models. Advances in Neural Information Processing Systems, 22.
  7. A kernel test of goodness of fit. In Proceedings of the 33r⁢dsuperscript33𝑟𝑑33^{rd}33 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT International Conference on Machine Learning.
  8. Coherent dispersion criteria for optimal experimental design. Annals of Statistics, 27(1):65–81.
  9. Neural variational gradient descent. In Fourth Symposium on Advances in Approximate Bayesian Inference.
  10. Visualization in Bayesian workflow. Journal of the Royal Statistical Society Series A, 182(2):389–402.
  11. Scalable Bayesian learning of recurrent neural networks for language modeling. arXiv pre-print arXiv:1611.08034.
  12. Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24:997–1016.
  13. Structured variational learning of Bayesian neural networks with horseshoe priors. In Proceedings of the 35t⁢hsuperscript35𝑡ℎ35^{th}35 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Machine Learning.
  14. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102:359–378.
  15. Measuring sample quality with Stein’s method. Advances in Neural Information Processing Systems, 28.
  16. Probabilistic backpropagation for scalable learning of Bayesian neural networks. In Proceedings of the 32𝑛𝑑superscript32𝑛𝑑32^{\text{nd}}32 start_POSTSUPERSCRIPT nd end_POSTSUPERSCRIPT International Conference on Machine Learning.
  17. Black-box α𝛼\alphaitalic_α-divergence minimization. In Proceedings of the 33r⁢dsuperscript33𝑟𝑑33^{rd}33 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT International Conference on Machine Learning.
  18. Stochastic variational inference. Journal of Machine Learning Research, 14:1303–1347.
  19. Validated variational inference via practical posterior error bounds. In Proceedings of the 23𝑟𝑑superscript23𝑟𝑑23^{\text{rd}}23 start_POSTSUPERSCRIPT rd end_POSTSUPERSCRIPT International Conference on Artificial Intelligence and Statistics.
  20. Reproducible model selection using bagged posteriors. Bayesian Analysis, 18(1):79–104.
  21. Subspace inference for Bayesian deep learning. In Uncertainty in Artificial Intelligence.
  22. What are Bayesian neural network posteriors really like? In Proceedings of the 38t⁢hsuperscript38𝑡ℎ38^{th}38 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Machine Learning.
  23. Lessons learned in the challenge: making predictions and scoring them. In Machine Learning Challenges Workshop, pages 95–116. Springer.
  24. High-order stochastic gradient thermostates for Bayesian learning of deep models. In Proceedings of the Thirtieth AAAI Conference on Artifical Intelligence.
  25. A kernelized Stein discrepancy for goodness-of-fit tests. In Proceedings of the 33r⁢dsuperscript33𝑟𝑑33^{rd}33 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT International Conference on Machine Learning.
  26. Stein variaitonal gradient descent: A general purpose Bayesian inference algorithm. In Advances in Neural Informational Processing Systems.
  27. Bayesian model selection, the marginal likelihood, and generalization. In Proceedings of the 39𝑡ℎsuperscript39𝑡ℎ39^{\text{th}}39 start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT International Conference on Machine Learning.
  28. Structured and efficient variational deep learning with matrix Gaussian posteriors. In Proceedings of the 33r⁢dsuperscript33𝑟𝑑33^{rd}33 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT International Conference on Machine Learning.
  29. MacKay, D. J. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press.
  30. A simple baseline for Bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems, 32.
  31. SLANG: Fast structured covariance approximations for Bayesian deep learning with natural gradient. In Advances in Neural Informational Processing Systems.
  32. Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes. In Proceedings of the 38t⁢hsuperscript38𝑡ℎ38^{th}38 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Machine Learning.
  33. Evaluating predictive uncertainty challenge. In Machine Learning Challenges Workshop, pages 1–27. Springer.
  34. Black box variational inference. In Proceedings of the 17𝑡ℎsuperscript17𝑡ℎ17^{\text{th}}17 start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT International Conference on Artificial Intelligence and Statistics.
  35. Robert, C. P. (1996). Intrinsic losses. Theory and Decision, 40:191–214.
  36. Kernel implicit variational inference. In International Conference on Learning Representations.
  37. Learning structured weight uncertaitny in Bayesian neural networks. In Proceedings of the 20t⁢hsuperscript20𝑡ℎ20^{th}20 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Artificial Intelligence and Statistics.
  38. Validating Bayesian inference algorithms with simulation-based calibration. arXiv preprint arXiv:1804.06788.
  39. Vowels, M. J. (2023). Misspecification and unreliable interpretations in psychology and social science. Psychological Methods, 28(3):507.
  40. Deterministic variational inference for robust Bayesian neural networks. In International Conference on Learning Representations.
  41. Quality of uncertainty quantification for Bayesian neural network inference. arXiv pre-print arXiv:1906.09686.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 10 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com