Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models (2410.24079v3)

Published 31 Oct 2024 in cs.LG and stat.ML

Abstract: Bayesian reasoning in linear mixed-effects models (LMMs) is challenging and often requires advanced sampling techniques like Markov chain Monte Carlo (MCMC). A common approach is to write the model in a probabilistic programming language and then sample via Hamiltonian Monte Carlo (HMC). However, there are many ways a user can transform a model that make inference more or less efficient. In particular, marginalizing some variables can greatly improve inference but is difficult for users to do manually. We develop an algorithm to easily marginalize random effects in LMMs. A naive approach introduces cubic time operations within an inference algorithm like HMC, but we reduce the running time to linear using fast linear algebra techniques. We show that marginalization is always beneficial when applicable and highlight improvements in various models, especially ones from cognitive sciences.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Alan Agresti. Foundations of linear and generalized linear models. John Wiley & Sons, 2015.
  2. The pseudo-marginal approach for efficient Monte Carlo computations. 2009.
  3. Semi-symbolic inference for efficient streaming probabilistic programming. Proceedings of the ACM on Programming Languages, 6(OOPSLA2):1668–1696, 2022.
  4. Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48, 2015. doi: 10.18637/jss.v067.i01.
  5. Pyro: Deep universal probabilistic programming. Journal of machine learning research, 20(28):1–6, 2019.
  6. Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
  7. Applied mixed models in medicine. John Wiley & Sons, 2014.
  8. Paul-Christian Bürkner. BRMS: An R package for Bayesian multilevel models using Stan. Journal of statistical software, 80:1–28, 2017.
  9. Stan: A probabilistic programming language. Journal of statistical software, 76, 2017.
  10. Analysis of multivariate probit models. Biometrika, 85(2):347–361, 1998.
  11. Michael Creutz. Global Monte Carlo algorithms for many-fermion systems. Physical Review D, 38(4):1228, 1988.
  12. Contrasting intrusion profiles for agreement and anaphora: Experimental and modeling evidence. Journal of Memory and Language, 69(2):85–103, 2013.
  13. Tensorflow distributions. arXiv preprint arXiv:1711.10604, 2017.
  14. Justin Domke. Generic methods for optimization-based modeling. In Artificial Intelligence and Statistics, pages 318–326. PMLR, 2012.
  15. Hybrid Monte Carlo. Physics letters B, 195(2):216–222, 1987.
  16. Many labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67:68–82, 2016.
  17. Cross-linguistic differences in processing double-embedded relative clauses: Working-memory constraints or language statistics? Cognitive science, 40(3):554–578, 2016.
  18. Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Statistics and Computing, 19:479–492, 2009.
  19. Estimation and inference for very large linear mixed effects models. Statistica Sinica, 30(4):1741–1771, 2020.
  20. GPyTorch: Blackbox matrix-matrix Gaussian process inference with GPU acceleration. Advances in neural information processing systems, 31, 2018.
  21. PSI: Exact symbolic inference for probabilistic programs. In Computer Aided Verification: 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I 28, pages 62–83. Springer, 2016.
  22. λ𝜆\lambdaitalic_λPSI: exact inference for higher-order probabilistic programs. In Proceedings of the 41st acm sigplan conference on programming language design and implementation, pages 883–897, 2020.
  23. Data analysis using regression and multilevel/hierarchical models. Cambridge university press, 2006.
  24. Scalable solution to crossed random effects model with random slopes. arXiv preprint arXiv:2307.12378, 2023.
  25. Memory limitations and structural forgetting: The perception of complex ungrammatical sentences as grammatical. Language and Cognitive Processes, 14(3):225–248, 1999.
  26. Automatic reparameterisation of probabilistic programs. In International Conference on Machine Learning, pages 3648–3657. PMLR, 2020.
  27. Conditional independence by typing. ACM Transactions on Programming Languages and Systems (TOPLAS), 44(1):1–54, 2021.
  28. Fast methods for posterior inference of two-group normal-normal models. Bayesian Analysis, 18(3):889–907, 2023.
  29. Consequences of the serial nature of linguistic input for sentenial complexity. Cognitive science, 29(2):261–290, 2005.
  30. Deterministic Langevin Monte Carlo with normalizing flows for Bayesian inference. Advances in Neural Information Processing Systems, 35:11629–11641, 2022.
  31. A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 6:e4794, 2018.
  32. David A Harville. Matrix algebra from a statistician’s perspective, 1998.
  33. Neutra-lizing bad geometry in Hamiltonian Monte Carlo using neural transport. arXiv preprint arXiv:1903.03704, 2019.
  34. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
  35. Automatically marginalized MCMC in probabilistic programming. In International Conference on Machine Learning, pages 18301–18318. PMLR, 2023.
  36. Generating random correlation matrices based on vines and extended onion method. Journal of multivariate analysis, 100(9):1989–2001, 2009.
  37. Jun S Liu. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. Journal of the American Statistical Association, 89(427):958–966, 1994.
  38. Colin M MacLeod. Half a century of research on the Stroop effect: an integrative review. Psychological bulletin, 109(2):163, 1991.
  39. Sparse probit linear mixed model. Machine Learning, 106:1621–1642, 2017.
  40. Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond. Advances in Neural Information Processing Systems, 33:9086–9097, 2020.
  41. Best practice guidance for linear mixed-effects models in psychological science. Journal of Memory and Language, 112:104092, 2020.
  42. Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Sequential Monte Carlo methods in practice, pages 499–515. Springer, 2001.
  43. Delayed sampling and automatic Rao-Blackwellization of probabilistic programs. In International Conference on Artificial Intelligence and Statistics, pages 1037–1046. PMLR, 2018.
  44. Probabilistic inference by program transformation in Hakaru (system description). In Functional and Logic Programming: 13th International Symposium, FLOPS 2016, Kochi, Japan, March 4-6, 2016, Proceedings 13, pages 62–79. Springer, 2016.
  45. Radford M Neal. Slice sampling. The annals of statistics, 31(3):705–767, 2003.
  46. Radford M Neal et al. MCMC using Hamiltonian dynamics. Handbook of Markov chain Monte Carlo, 2(11):2, 2011.
  47. An introduction to Bayesian data analysis for cognitive science. Under contract with Chapman and Hall/CRC statistics in the social and behavioral sciences series, 2021.
  48. Large-scale replication study reveals a limit on probabilistic prediction in language comprehension. ELife, 7:e33468, 2018.
  49. A general framework for the parametrization of hierarchical models. Statistical Science, pages 59–73, 2007.
  50. Scalable inference for crossed random effects models. Biometrika, 107(1):25–40, 2020.
  51. Scalable Bayesian computation for crossed and nested hierarchical models. Electronic Journal of Statistics, 17(2):3575–3612, 2023.
  52. Transport map accelerated Markov chain Monte Carlo. SIAM/ASA Journal on Uncertainty Quantification, 6(2):645–682, 2018.
  53. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008.
  54. Composable effects for flexible and accelerated probabilistic programming in NumPyro. arXiv preprint arXiv:1912.11554, 2019.
  55. Bayesian inference for logistic models using Pólya–Gamma latent variables. Journal of the American statistical Association, 108(504):1339–1349, 2013.
  56. Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 569–577, 2008.
  57. CoLA: Exploiting compositional structure for automatic and efficient numerical linear algebra. Advances in Neural Information Processing Systems, 36, 2024.
  58. Microcanonical Hamiltonian Monte Carlo. The Journal of Machine Learning Research, 24(1):14696–14729, 2023.
  59. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(2):319–392, 2009.
  60. Bayesian filtering and smoothing, volume 17. Cambridge university press, 2023.
  61. Matthias Seeger. Gaussian processes for machine learning. International journal of neural systems, 14(02):69–106, 2004.
  62. Auto-differentiating linear algebra. arXiv preprint arXiv:1710.08717, 2017.
  63. OSIC pulmonary fibrosis progression, 2020.
  64. Generalized deep mixed models. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3869–3877, 2022.
  65. Bayesian multinomial logistic normal models through marginally latent matrix-T processes. Journal of Machine Learning Research, 23(7):1–42, 2022.
  66. Using random effects to account for high-cardinality categorical features and repeated measures in deep neural networks. Advances in Neural Information Processing Systems, 34:25111–25122, 2021.
  67. Integrating random effects in deep neural networks. Journal of Machine Learning Research, 24(156):1–57, 2023.
  68. Wessel N van Wieringen. Lecture notes on ridge regression. arXiv preprint arXiv:1509.09169, 2015.
  69. Short-term forgetting in sentence comprehension: Crosslinguistic evidence from verb-final structures. Language and Cognitive Processes, 25(4):533–567, 2010.
  70. Processing Chinese relative clauses: Evidence for the subject-relative advantage. PloS one, 8(10):e77006, 2013.
  71. Hamiltonian dynamics with non-Newtonian momentum for rapid sampling. Advances in Neural Information Processing Systems, 34:11012–11025, 2021.
  72. Pupil sizes scale with attentional load and task experience in a multiple object tracking task. PloS one, 11(12):e0168087, 2016.
  73. Accelerating Hamiltonian Monte Carlo via Chebyshev integration time. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023, 2023.
  74. Neural mixed effects for nonlinear personalized predictions. In Proceedings of the 25th International Conference on Multimodal Interaction, pages 445–454, 2023.
  75. HI Wu and E Gibson. Processing Chinese relative clauses in context. In Poster presented at the 21st CUNY Conference on Sentence Processing, University of North Carolina at Chapel Hill, 2008.
  76. Yes, but did it work?: Evaluating variational inference. In International Conference on Machine Learning, pages 5581–5590. PMLR, 2018.
  77. Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron, 110(1):21–35, 2022.

Summary

  • The paper introduces an algorithm that analytically marginalizes random effects in LMMs, reducing computational complexity from cubic to linear time.
  • It employs optimized linear algebra techniques to overcome pathological geometries typical in traditional HMC implementations.
  • The method enhances effective sample size and integrates seamlessly with probabilistic programming for robust Bayesian inference.

An Analysis of Hamiltonian Monte Carlo for Marginalized Linear Mixed-Effects Models

The paper "Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models" by Lai, Sheldon, and Domke presents an innovative approach to efficiently perform Bayesian inference in Linear Mixed-Effects Models (LMMs) using Hamiltonian Monte Carlo (HMC). LMMs are hierarchical models widely deployed across disciplines such as ecology, medicine, psychology, and neuroscience to account for complex relationships in data by incorporating both fixed and random effects. Traditional HMC implementations often face efficiency challenges due to pathological geometries, such as the funnel shape created by correlation between variance parameters and fixed/random effect parameters. The key contribution of this research is an algorithm that utilizes fast linear algebra techniques to marginalize random effects analytically, reducing computational complexity from cubic to linear time—a significant advancement for practical inference in LMMs.

The authors demonstrate that marginalization mitigates pathologies hindering efficient sampling, substantially improves the efficiency of HMC, and consistently enhances effective sample size (ESS) relative to computation time. These benefits hold particularly for models rooted in cognitive sciences where complex cross-effects models may exist. For instance, models where each observation belongs to multiple random effects groups (e.g., subject and item in lesson evaluations) can significantly benefit from the proposed approach as marginalizing certain groups leads to improved efficiency without degrading sampling quality.

Moreover, the paper systematically addresses challenges of naive marginalization that typically lead to prohibitively expensive cubic time evaluations during HMC iterations, particularly arising from dense covariance matrices. By employing matrix inversion and determinant lemmas optimized for block-diagonal structures, the authors provide a nuanced solution that retains linear time complexity even for relatively high dimensions of random effects in LMMs while ensuring that performance on models with scaled identity covariance matrices—often a practical assumption—is substantially enhanced.

In the Bayesian framework, precise posterior inference is crucial, and the authors’ method interplays directly with commonplace probabilistic programming languages, enhancing user accessibility while automating marginalization steps that are non-trivial for typical users to perform. The integration within platforms such as NumPyro, which is leveraged in their implementation, represents a notable improvement, allowing practitioners to express complex hierarchical models succinctly and rely on backend optimizations for inference efficiency.

The application of the technique is not limited to specific forms of likelihoods but extends to normal and log-normal settings, and potentially other continuous extensions, underscoring its adaptability in broader analytical contexts. As future work, the exploration of broader types of distributions, such as those commonly seen in probit regression or classification tasks, without inducing undesirable computational burdens, can accentuate the versatility and applicability of their methods across diverse inferential scenarios.

This research opens pathways towards integrating marginalization automatically into workflow in probabilistic programming environments, offering potential automation of transformation processes for users with specified LMMs. Future development might aim to routinely apply transformations autonomously, blending user-specified high-level model descriptions with targeted vectorization to further harness the advancements demonstrated.

Overall, this paper presents a significant methodological enhancement in computational Bayesian inference, delivering both theoretical elegance and practical utility, promising enriched capabilities for researchers dealing with the complexities of LMMs and extending Bayesian hierarchical modeling frontiers.

X Twitter Logo Streamline Icon: https://streamlinehq.com