Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Deep Learning Method for Comparing Bayesian Hierarchical Models (2301.11873v4)

Published 27 Jan 2023 in stat.ML, cs.LG, and stat.ME

Abstract: Bayesian model comparison (BMC) offers a principled approach for assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then showcase our method by comparing four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. Additionally, we demonstrate how transfer learning can be leveraged to enhance training efficiency. We provide reproducible code for all analyses and an open-source implementation of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (110)
  1. “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems” In arXiv preprint arXiv:1603.04467, 2015
  2. Andrew Barron, Mark J Schervish and Larry Wasserman “The consistency of posterior distributions in nonparametric problems” In The Annals of Statistics 27.2 Institute of Mathematical Statistics, 1999, pp. 536–561
  3. Mark A Beaumont “Approximate Bayesian computation in evolution and ecology” In Annual review of ecology, evolution, and systematics 41 Annual Reviews, 2010, pp. 379–406
  4. “Curriculum learning” In Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48
  5. Charles H Bennett “Efficient estimation of free energy differences from Monte Carlo data” In Journal of Computational Physics 22.2 Elsevier, 1976, pp. 245–268
  6. José M Bernardo and Adrian FM Smith “Bayesian Theory” John Wiley & Sons, 1994
  7. “Julia: A fresh approach to numerical computing” In SIAM review 59.1 SIAM, 2017, pp. 65–98
  8. Benjamin Bloem-Reddy and Yee Whye Teh “Probabilistic Symmetries and Invariant Neural Networks.” In J. Mach. Learn. Res. 21, 2020, pp. 90–1
  9. “Estimating across-trial variability parameters of the Diffusion Decision Model: Expert advice and recommendations” In Journal of Mathematical Psychology 87, 2018, pp. 46–75
  10. Paul-Christian Bürkner “brms: An R package for Bayesian multilevel models using Stan” In Journal of statistical software 80, 2017, pp. 1–28
  11. Paul-Christian Bürkner, Maximilian Scholz and Stefan T Radev “Some models are useful, but how do we know which ones? Towards a unified Bayesian model taxonomy” In arXiv preprint arXiv:2209.02439, 2022
  12. “Stan: A probabilistic programming language” In Journal of statistical software 76.1 Columbia Univ., New York, NY (United States); Harvard Univ., Cambridge, MA …, 2017
  13. “Componentwise approximate Bayesian computation via Gibbs-like steps” In Biometrika 108.3 Oxford University Press, 2021, pp. 591–607
  14. “A study of uncertainty quantification in overparametrized high-dimensional models” In arXiv preprint arXiv:2210.12760, 2022
  15. Anne GE Collins and Amitai Shenhav “Advances in modeling learning and decision-making in neuroscience” In Neuropsychopharmacology 47.1 Nature Publishing Group, 2022, pp. 104–118
  16. Peter Congdon “Bayesian model choice based on Monte Carlo estimates of posterior model probabilities” In Computational statistics & data analysis 50.2 Elsevier, 2006, pp. 346–357
  17. Kyle Cranmer, Johann Brehmer and Gilles Louppe “The frontier of simulation-based inference” In Proceedings of the National Academy of Sciences 117.48 National Acad Sciences, 2020, pp. 30055–30062
  18. Chris Cremer, Xuechen Li and David Duvenaud “Inference suboptimality in variational autoencoders” In International Conference on Machine Learning, 2018, pp. 1078–1086 PMLR
  19. “Approximate Bayesian computation (ABC) in practice” In Trends in Ecology & Evolution 25.7, 2010, pp. 410–418
  20. Morris H DeGroot and Stephen E Fienberg “The comparison and evaluation of forecasters” In Journal of the Royal Statistical Society: Series D (The Statistician) 32.1-2 Wiley Online Library, 1983, pp. 12–22
  21. James M Dickey and BP Lientz “The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain” In The Annals of Mathematical Statistics JSTOR, 1970, pp. 214–226
  22. Charles C Driver and Manuel C Voelkle “Hierarchical Bayesian continuous time dynamic modeling.” In Psychological Methods 23.4 American Psychological Association, 2018, pp. 774
  23. “Multinomial processing tree models: A review of the literature.” In Zeitschrift für Psychologie/Journal of Psychology 217.3 Hogrefe & Huber Publishers, 2009, pp. 108
  24. “Computational modeling of cognition and behavior” Cambridge University Press, 2018
  25. “Likelihood approximation networks (LANs) for fast inference of simulation models in cognitive neuroscience” In Elife 10 eLife Sciences Publications Limited, 2021, pp. e65074
  26. Andrew Gelman “Multilevel (hierarchical) modeling: what it can and cannot do” In Technometrics 48.3 Taylor & Francis, 2006, pp. 432–435
  27. “Simulating normalizing constants: From importance sampling to bridge sampling to path sampling” In Statistical science JSTOR, 1998, pp. 163–185
  28. “Bayesian workflow” In arXiv preprint arXiv:2011.01808, 2020
  29. Amin Ghaderi-Kangavari, Jamal Amani Rad and Michael D Nunez “A general integrative neurocognitive modeling framework to jointly describe EEG and decision-making on single trials” In Computational Brain & Behavior Springer, 2023, pp. 1–60
  30. Tilmann Gneiting and Adrian E Raftery “Strictly proper scoring rules, prediction, and estimation” In Journal of the American statistical Association 102.477 Taylor & Francis, 2007, pp. 359–378
  31. “Training deep neural density estimators to identify mechanistic models of neural dynamics” In Elife 9 eLife Sciences Publications Limited, 2020, pp. e56261
  32. “Your classifier is secretly an energy based model and you should treat it like one” In arXiv preprint arXiv:1912.03263, 2019
  33. David Marvin Green and John A Swets “Signal detection theory and psychophysics” Wiley New York, 1966
  34. Quentin F. Gronau “Hierarchical Normal Example (Stan)” In CRAN, 2021 URL: https://cran.csiro.au/web/packages/bridgesampling/vignettes/bridgesampling_example_stan.html
  35. “A tutorial on bridge sampling” In Journal of mathematical psychology 81 Elsevier, 2017, pp. 80–97
  36. Quentin F Gronau, Henrik Singmann and Eric-Jan Wagenmakers “bridgesampling: An R package for estimating normalizing constants” In arXiv preprint arXiv:1710.08162, 2017
  37. Quentin F Gronau and Eric-Jan Wagenmakers “Limitations of Bayesian leave-one-out cross-validation for model selection” In Computational brain & behavior 2.1 Springer, 2019, pp. 1–11
  38. Quentin F Gronau and Eric-Jan Wagenmakers “Rejoinder: More limitations of Bayesian leave-one-out cross-validation” In Computational Brain & Behavior 2.1 Springer, 2019, pp. 35–47
  39. “A simple method for comparing complex models: Bayesian model comparison for hierarchical multinomial processing tree models using Warp-III bridge sampling” In Psychometrika 84.1 Springer, 2019, pp. 261–284
  40. Quentin F Gronau, Andrew Heathcote and Dora Matzke “Computing Bayes factors for evidence-accumulation models using Warp-III bridge sampling” In Behavior research methods 52.2 Springer, 2020, pp. 918–937
  41. “On calibration of modern neural networks” In International Conference on Machine Learning, 2017, pp. 1321–1330 PMLR
  42. Julia M Haaf and Jeffrey N Rouder “Developing constraint in Bayesian mixed models.” In Psychological methods 22.4 American Psychological Association, 2017, pp. 779
  43. Julia M Haaf, Fayette Klaassen and Jeffrey N Rouder “Bayes factor vs. Posterior-Predictive Model Assessment: Insights from Ordinal Constraints” In PsyArXiv preprint PsyArXiv, 2021
  44. “Winner takes all! What are race models, and why and how should psychologists use them?” In Current Directions in Psychological Science 31.5 SAGE Publications Sage CA: Los Angeles, CA, 2022, pp. 383–394
  45. Daniel W Heck and Edgar Erdfelder “Maximizing the expected information gain of cognitive modeling via design optimization” In Computational Brain & Behavior 2.3 Springer, 2019, pp. 202–209
  46. “A review of applications of the bayes factor in psychological research.” In Psychological Methods American Psychological Association, 2022
  47. “Towards constraining warm dark matter with stellar streams through neural simulation-based inference” In Monthly Notices of the Royal Astronomical Society 507.2 Oxford University Press, 2021, pp. 1999–2011
  48. “Averting a crisis in simulation-based inference” In stat 1050, 2021, pp. 14
  49. “Steve: A hierarchical Bayesian model for supernova cosmology” In The Astrophysical Journal 876.1 IOP Publishing, 2019, pp. 15
  50. Joop J Hox, Mirjam Moerbeek and Rens Van de Schoot “Multilevel analysis: Techniques and applications” Routledge, 2017
  51. “A hierarchical spatio-temporal model to analyze relative risk variations of COVID-19: a focus on Spain, Italy and Germany” In Stochastic Environmental Research and Risk Assessment 35.4 Springer, 2021, pp. 797–812
  52. Robert E Kass and Adrian E Raftery “Bayes factors” In Journal of the american statistical association 90.430 Taylor & Francis, 1995, pp. 773–795
  53. Diederik P. Kingma and Jimmy Lei Ba “Adam: A method for stochastic optimization” In 3rd International Conference on Learning Representations, 2015, pp. 1–15
  54. Karl Christoph Klauer “Hierarchical multinomial processing tree models: A latent-trait approach” In Psychometrika 75.1 Springer, 2010, pp. 70–98
  55. Michael D Lee “How cognitive modeling can benefit from hierarchical Bayesian models” In Journal of Mathematical Psychology 55.1 Elsevier, 2011, pp. 1–7
  56. “Model complexity in diffusion modeling: Benefits of making the model more parsimonious” In Frontiers in Psychology 7.1324, 2016, pp. 1–14
  57. “A hierarchical state space approach to affective dynamics” In Journal of mathematical psychology 55.1 Elsevier, 2011, pp. 68–83
  58. “Bayesian Model Selection, the Marginal Likelihood, and Generalization” In arXiv preprint arXiv:2202.11678, 2022
  59. David MacKay “Information theory, inference and learning algorithms” Cambridge university press, 2003
  60. “Likelihood-free model choice” ChapmanHall/CRC Press Boca Raton, FL, 2018
  61. “Markov chain Monte Carlo without likelihoods” In Proceedings of the National Academy of Sciences 100.26 National Acad Sciences, 2003, pp. 15324–15328
  62. Richard McElreath “Statistical rethinking: A Bayesian course with examples in R and Stan” ChapmanHall/CRC, 2020
  63. “Warp bridge sampling” In Journal of Computational and Graphical Statistics 11.3 Taylor & Francis, 2002, pp. 552–586
  64. Xiao-Li Meng and Wing Hung Wong “Simulating ratios of normalizing constants via a simple identity: a theoretical exploration” In Statistica Sinica JSTOR, 1996, pp. 831–860
  65. Ulf Kai Mertens, Andreas Voss and Stefan Radev “ABrox—A user-friendly Python module for approximate Bayesian computation with a focus on model comparison” In PloS one 13.3 Public Library of Science San Francisco, CA USA, 2018, pp. e0193981
  66. “Prepaid parameter estimation without likelihoods” In PLoS computational biology 15.9 Public Library of Science San Francisco, CA USA, 2019, pp. e1007181
  67. Jay I Myung and Mark A Pitt “Optimal experimental design for model discrimination.” In Psychological review 116.3 American Psychological Association, 2009, pp. 499
  68. Mahdi Pakdaman Naeini, Gregory Cooper and Milos Hauskrecht “Obtaining Well Calibrated Probabilities Using Bayesian Binning” In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence AAAI Press, 2015, pp. 2901–2907
  69. “Deep learning for emotion recognition on small datasets using transfer learning” In Proceedings of the 2015 ACM on international conference on multimodal interaction, 2015, pp. 443–449
  70. Bruno Nicenboim, Daniel J Schad and Shravan Vasishth “An introduction to Bayesian data analysis for cognitive science”, 2022 URL: https://vasishth.github.io/bayescogsci/book/
  71. “Predicting good probabilities with supervised learning” In Proceedings of the 22nd international conference on Machine learning, 2005, pp. 625–632
  72. Anthony O’Hagan “Fractional Bayes factors for model comparison” In Journal of the Royal Statistical Society: Series B (Methodological) 57.1 Wiley Online Library, 1995, pp. 99–118
  73. “Reliable ABC model choice via random forests” In Bioinformatics 32.6 Oxford University Press, 2016, pp. 859–866
  74. “Amortized bayesian model comparison with evidential deep learning” In IEEE Transactions on Neural Networks and Learning Systems IEEE, 2021
  75. “OutbreakFlow: Model-based Bayesian inference of disease outbreak dynamics with invertible neural networks and its application to the COVID-19 pandemics in Germany” In PLoS computational biology 17.10 Public Library of Science San Francisco, CA USA, 2021, pp. e1009472
  76. “BayesFlow: Learning complex stochastic models with invertible neural networks” In IEEE transactions on neural networks and learning systems IEEE, 2020
  77. “BayesFlow: Amortized Bayesian Workflows With Neural Networks” In arXiv preprint arXiv:2306.16015, 2023
  78. “Diffusion decision model: Current issues and history” In Trends in cognitive sciences 20.4 Elsevier, 2016, pp. 260–281
  79. David M Riefer and William H Batchelder “Multinomial modeling and the measurement of cognitive processes.” In Psychological Review 95.3 American Psychological Association, 1988, pp. 318
  80. Jeffrey N Rouder and Jun Lu “An introduction to Bayesian hierarchical models with an application in the theory of signal detection” In Psychonomic bulletin & review 12.4 Springer, 2005, pp. 573–604
  81. Jeffrey N Rouder and Richard D Morey “Default Bayes factors for model selection in regression” In Multivariate Behavioral Research 47.6 Taylor & Francis, 2012, pp. 877–903
  82. Jeffrey N Rouder, Richard D Morey and Michael S Pratte “Bayesian hierarchical models of cognition.” In New handbook of mathematical psychology: Foundations and methodology Cambridge University Press, 2017, pp. 504–551
  83. “Theory-Informed Refinement of Bayesian Hierarchical MPT Modeling” In PsyArXiv preprint PsyArXiv, 2022
  84. Daniel J Schad, Michael Betancourt and Shravan Vasishth “Toward a principled Bayesian workflow in cognitive science.” In Psychological methods 26.1 American Psychological Association, 2021, pp. 103
  85. “Workflow techniques for the robust use of bayes factors.” In Psychological Methods American Psychological Association, 2022
  86. Marvin Schmitt, Stefan T Radev and Paul-Christian Bürkner “Meta-Uncertainty in Bayesian Model Comparison” In arXiv preprint arXiv:2210.07278, 2022
  87. “Neural Superstatistics: A Bayesian Method for Estimating Dynamic Models of Cognition” In arXiv preprint arXiv:2211.13165, 2022
  88. “An introduction to mixed models for experimental psychology” In New methods in Cognitive Psychology New York: Routledge, 2019, pp. 4–31
  89. “MPTinR: Analysis of multinomial processing tree models in R” In Behavior Research Methods 45.2 Springer, 2013, pp. 560–575
  90. Scott A Sisson, Yanan Fan and Mark M Tanaka “Sequential monte carlo without likelihoods” In Proceedings of the National Academy of Sciences 104.6 National Acad Sciences, 2007, pp. 1760–1765
  91. Stan Development Team “Stan Modeling Language Users Guide and Reference Manual” Version 2.21.0, 2019 URL: https://mc-stan.org
  92. “Approximate bayesian computation” In PLoS computational biology 9.1 Public Library of Science, 2013, pp. e1002803
  93. “Validating Bayesian inference algorithms with simulation-based calibration” In arXiv preprint arXiv:1804.06788, 2018
  94. “Lecture 6.5-rmsprop, coursera: Neural networks for machine learning” In University of Toronto, Technical Report 6, 2012
  95. “Transfer learning” In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques IGI global, 2010, pp. 242–264
  96. “Systematic Parameter Reviews in Cognitive Modeling: Towards a Robust and Cumulative Characterization of Psychological Processes in the Diffusion Decision Model” In Frontiers in Psychology 11, 2021
  97. Brandon M Turner and Per B Sederberg “A generalized, likelihood-free method for posterior estimation” In Psychonomic bulletin & review 21.2 Springer, 2014, pp. 227–250
  98. Brandon M Turner and Trisha Van Zandt “Hierarchical approximate Bayesian computation” In Psychometrika 79.2 Springer, 2014, pp. 185–209
  99. Esther Ulitzsch, Matthias Davier and Steffi Pohl “A multiprocess item response model for not-reached items due to time limits and quitting” In Educational and Psychological Measurement 80.3 Sage Publications Sage CA: Los Angeles, CA, 2020, pp. 522–547
  100. “Cognition and intractability: A guide to classical and parameterized complexity analysis” Cambridge University Press, 2019
  101. Joachim Vandekerckhove, Francis Tuerlinckx and Michael D Lee “Hierarchical diffusion models for two-choice response times.” In Psychological methods 16.1 American Psychological Association, 2011, pp. 44
  102. Wolf Vanpaemel “Prior sensitivity in theory testing: An apologia for the Bayes factor” In Journal of Mathematical Psychology 54.6 Elsevier, 2010, pp. 491–498
  103. “Limitations of “Limitations of Bayesian leave-one-out cross-validation for model selection”” In Computational Brain & Behavior 2.1 Springer, 2019, pp. 22–27
  104. “Rank-normalization, folding, and localization: an improved R for assessing convergence of MCMC (with discussion)” In Bayesian analysis 16.2 International Society for Bayesian Analysis, 2021, pp. 667–718
  105. “Fast-dm: A free program for efficient diffusion model analysis” In Behavior research methods 39.4 Springer, 2007, pp. 767–775
  106. “Sequential sampling models with variable boundaries and non-normal noise: A comparison of six models” In Psychonomic bulletin & review 26.3 Springer, 2019, pp. 813–832
  107. “Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method” In Cognitive psychology 60.3 Elsevier, 2010, pp. 158–189
  108. Thomas V Wiecki, Imri Sofer and Michael J Frank “HDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python” In Frontiers in Neuroinformatics 7 Frontiers, 2013
  109. Eva Marie Wieschen, Andreas Voss and Stefan Radev “Jumping to conclusion? a lévy flight model of decision making” In The Quantitative Methods for Psychology 16.2, 2020, pp. 120–132
  110. “Deep sets” In Advances in neural information processing systems 30, 2017
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Lasse Elsemüller (4 papers)
  2. Martin Schnuerch (2 papers)
  3. Paul-Christian Bürkner (58 papers)
  4. Stefan T. Radev (31 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.