Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models (2410.24079v3)
Abstract: Bayesian reasoning in linear mixed-effects models (LMMs) is challenging and often requires advanced sampling techniques like Markov chain Monte Carlo (MCMC). A common approach is to write the model in a probabilistic programming language and then sample via Hamiltonian Monte Carlo (HMC). However, there are many ways a user can transform a model that make inference more or less efficient. In particular, marginalizing some variables can greatly improve inference but is difficult for users to do manually. We develop an algorithm to easily marginalize random effects in LMMs. A naive approach introduces cubic time operations within an inference algorithm like HMC, but we reduce the running time to linear using fast linear algebra techniques. We show that marginalization is always beneficial when applicable and highlight improvements in various models, especially ones from cognitive sciences.
- Alan Agresti. Foundations of linear and generalized linear models. John Wiley & Sons, 2015.
- The pseudo-marginal approach for efficient Monte Carlo computations. 2009.
- Semi-symbolic inference for efficient streaming probabilistic programming. Proceedings of the ACM on Programming Languages, 6(OOPSLA2):1668–1696, 2022.
- Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48, 2015. doi: 10.18637/jss.v067.i01.
- Pyro: Deep universal probabilistic programming. Journal of machine learning research, 20(28):1–6, 2019.
- Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
- Applied mixed models in medicine. John Wiley & Sons, 2014.
- Paul-Christian Bürkner. BRMS: An R package for Bayesian multilevel models using Stan. Journal of statistical software, 80:1–28, 2017.
- Stan: A probabilistic programming language. Journal of statistical software, 76, 2017.
- Analysis of multivariate probit models. Biometrika, 85(2):347–361, 1998.
- Michael Creutz. Global Monte Carlo algorithms for many-fermion systems. Physical Review D, 38(4):1228, 1988.
- Contrasting intrusion profiles for agreement and anaphora: Experimental and modeling evidence. Journal of Memory and Language, 69(2):85–103, 2013.
- Tensorflow distributions. arXiv preprint arXiv:1711.10604, 2017.
- Justin Domke. Generic methods for optimization-based modeling. In Artificial Intelligence and Statistics, pages 318–326. PMLR, 2012.
- Hybrid Monte Carlo. Physics letters B, 195(2):216–222, 1987.
- Many labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67:68–82, 2016.
- Cross-linguistic differences in processing double-embedded relative clauses: Working-memory constraints or language statistics? Cognitive science, 40(3):554–578, 2016.
- Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Statistics and Computing, 19:479–492, 2009.
- Estimation and inference for very large linear mixed effects models. Statistica Sinica, 30(4):1741–1771, 2020.
- GPyTorch: Blackbox matrix-matrix Gaussian process inference with GPU acceleration. Advances in neural information processing systems, 31, 2018.
- PSI: Exact symbolic inference for probabilistic programs. In Computer Aided Verification: 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I 28, pages 62–83. Springer, 2016.
- λ𝜆\lambdaitalic_λPSI: exact inference for higher-order probabilistic programs. In Proceedings of the 41st acm sigplan conference on programming language design and implementation, pages 883–897, 2020.
- Data analysis using regression and multilevel/hierarchical models. Cambridge university press, 2006.
- Scalable solution to crossed random effects model with random slopes. arXiv preprint arXiv:2307.12378, 2023.
- Memory limitations and structural forgetting: The perception of complex ungrammatical sentences as grammatical. Language and Cognitive Processes, 14(3):225–248, 1999.
- Automatic reparameterisation of probabilistic programs. In International Conference on Machine Learning, pages 3648–3657. PMLR, 2020.
- Conditional independence by typing. ACM Transactions on Programming Languages and Systems (TOPLAS), 44(1):1–54, 2021.
- Fast methods for posterior inference of two-group normal-normal models. Bayesian Analysis, 18(3):889–907, 2023.
- Consequences of the serial nature of linguistic input for sentenial complexity. Cognitive science, 29(2):261–290, 2005.
- Deterministic Langevin Monte Carlo with normalizing flows for Bayesian inference. Advances in Neural Information Processing Systems, 35:11629–11641, 2022.
- A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 6:e4794, 2018.
- David A Harville. Matrix algebra from a statistician’s perspective, 1998.
- Neutra-lizing bad geometry in Hamiltonian Monte Carlo using neural transport. arXiv preprint arXiv:1903.03704, 2019.
- The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
- Automatically marginalized MCMC in probabilistic programming. In International Conference on Machine Learning, pages 18301–18318. PMLR, 2023.
- Generating random correlation matrices based on vines and extended onion method. Journal of multivariate analysis, 100(9):1989–2001, 2009.
- Jun S Liu. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. Journal of the American Statistical Association, 89(427):958–966, 1994.
- Colin M MacLeod. Half a century of research on the Stroop effect: an integrative review. Psychological bulletin, 109(2):163, 1991.
- Sparse probit linear mixed model. Machine Learning, 106:1621–1642, 2017.
- Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond. Advances in Neural Information Processing Systems, 33:9086–9097, 2020.
- Best practice guidance for linear mixed-effects models in psychological science. Journal of Memory and Language, 112:104092, 2020.
- Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Sequential Monte Carlo methods in practice, pages 499–515. Springer, 2001.
- Delayed sampling and automatic Rao-Blackwellization of probabilistic programs. In International Conference on Artificial Intelligence and Statistics, pages 1037–1046. PMLR, 2018.
- Probabilistic inference by program transformation in Hakaru (system description). In Functional and Logic Programming: 13th International Symposium, FLOPS 2016, Kochi, Japan, March 4-6, 2016, Proceedings 13, pages 62–79. Springer, 2016.
- Radford M Neal. Slice sampling. The annals of statistics, 31(3):705–767, 2003.
- Radford M Neal et al. MCMC using Hamiltonian dynamics. Handbook of Markov chain Monte Carlo, 2(11):2, 2011.
- An introduction to Bayesian data analysis for cognitive science. Under contract with Chapman and Hall/CRC statistics in the social and behavioral sciences series, 2021.
- Large-scale replication study reveals a limit on probabilistic prediction in language comprehension. ELife, 7:e33468, 2018.
- A general framework for the parametrization of hierarchical models. Statistical Science, pages 59–73, 2007.
- Scalable inference for crossed random effects models. Biometrika, 107(1):25–40, 2020.
- Scalable Bayesian computation for crossed and nested hierarchical models. Electronic Journal of Statistics, 17(2):3575–3612, 2023.
- Transport map accelerated Markov chain Monte Carlo. SIAM/ASA Journal on Uncertainty Quantification, 6(2):645–682, 2018.
- The matrix cookbook. Technical University of Denmark, 7(15):510, 2008.
- Composable effects for flexible and accelerated probabilistic programming in NumPyro. arXiv preprint arXiv:1912.11554, 2019.
- Bayesian inference for logistic models using Pólya–Gamma latent variables. Journal of the American statistical Association, 108(504):1339–1349, 2013.
- Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 569–577, 2008.
- CoLA: Exploiting compositional structure for automatic and efficient numerical linear algebra. Advances in Neural Information Processing Systems, 36, 2024.
- Microcanonical Hamiltonian Monte Carlo. The Journal of Machine Learning Research, 24(1):14696–14729, 2023.
- Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(2):319–392, 2009.
- Bayesian filtering and smoothing, volume 17. Cambridge university press, 2023.
- Matthias Seeger. Gaussian processes for machine learning. International journal of neural systems, 14(02):69–106, 2004.
- Auto-differentiating linear algebra. arXiv preprint arXiv:1710.08717, 2017.
- OSIC pulmonary fibrosis progression, 2020.
- Generalized deep mixed models. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3869–3877, 2022.
- Bayesian multinomial logistic normal models through marginally latent matrix-T processes. Journal of Machine Learning Research, 23(7):1–42, 2022.
- Using random effects to account for high-cardinality categorical features and repeated measures in deep neural networks. Advances in Neural Information Processing Systems, 34:25111–25122, 2021.
- Integrating random effects in deep neural networks. Journal of Machine Learning Research, 24(156):1–57, 2023.
- Wessel N van Wieringen. Lecture notes on ridge regression. arXiv preprint arXiv:1509.09169, 2015.
- Short-term forgetting in sentence comprehension: Crosslinguistic evidence from verb-final structures. Language and Cognitive Processes, 25(4):533–567, 2010.
- Processing Chinese relative clauses: Evidence for the subject-relative advantage. PloS one, 8(10):e77006, 2013.
- Hamiltonian dynamics with non-Newtonian momentum for rapid sampling. Advances in Neural Information Processing Systems, 34:11012–11025, 2021.
- Pupil sizes scale with attentional load and task experience in a multiple object tracking task. PloS one, 11(12):e0168087, 2016.
- Accelerating Hamiltonian Monte Carlo via Chebyshev integration time. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023, 2023.
- Neural mixed effects for nonlinear personalized predictions. In Proceedings of the 25th International Conference on Multimodal Interaction, pages 445–454, 2023.
- HI Wu and E Gibson. Processing Chinese relative clauses in context. In Poster presented at the 21st CUNY Conference on Sentence Processing, University of North Carolina at Chapel Hill, 2008.
- Yes, but did it work?: Evaluating variational inference. In International Conference on Machine Learning, pages 5581–5590. PMLR, 2018.
- Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron, 110(1):21–35, 2022.