A Scalable Variational Bayes Approach to Fit High-dimensional Spatial Generalized Linear Mixed Models (2402.15705v3)
Abstract: Gaussian and discrete non-Gaussian spatial datasets are common across fields like public health, ecology, geosciences, and social sciences. Bayesian spatial generalized linear mixed models (SGLMMs) are a flexible class of models for analyzing such data, but they struggle to scale to large datasets. Many scalable Bayesian methods, built upon basis representations or sparse covariance matrices, still rely on posterior sampling via Markov chain Monte Carlo (MCMC). Variational Bayes (VB) methods have been applied to SGLMMs, but only for small areal datasets. We propose two computationally efficient VB approaches for analyzing moderately sized and massive (millions of locations) Gaussian and discrete non-Gaussian spatial data in the continuous spatial domain. Our methods leverage semi-parametric approximations of latent spatial processes and parallel computing to ensure computational efficiency. The proposed methods deliver inferential and predictive performance comparable to gold-standard MCMC methods while achieving computational speedups of up to 3600 times. In most cases, our VB approaches outperform state-of-the-art alternatives such as INLA and Hamiltonian Monte Carlo. We validate our methods through a comparative numerical study and applications to real-world datasets. These VB approaches can enable practitioners to model millions of discrete non-Gaussian spatial observations on standard laptops, significantly expanding access to advanced spatial modeling tools.