Bayesian Generalised Linear Models
- Bayesian Generalised Linear Models are statistical frameworks that extend traditional regression to handle binary, count, and bounded responses through link functions and prior distributions.
- They employ objective priors—including g-priors and their mixtures—to enable adaptive shrinkage, uncertainty quantification, and efficient model selection.
- Advanced inference techniques, such as Laplace approximations, expectation propagation, and INLA, facilitate scalable, high-dimensional Bayesian analyses in diverse applications.
Bayesian Generalised Linear Models (GLMs) are a family of statistical models that extend linear regression to accommodate diverse exponential-family data types—such as binary, count, and bounded continuous responses—while providing comprehensive probabilistic inference through the Bayesian paradigm. In these models, a link function relates the expected value of the response variable to a linear predictor involving unknown regression coefficients. Bayesian formulations augment this structure by placing prior distributions on the unknown parameters, facilitating uncertainty quantification, model selection, predictive inference, and the integration of external information within a coherent probabilistic framework.
1. Priors and Objective Prior Formulation
In Bayesian GLMs, a principal challenge is the specification of prior distributions for the regression coefficients, especially when variable selection or model averaging is of interest. One prominent class is the g-prior and its generalisations. For classical linear models, Zellner's g-prior imposes
ensuring automatic shrinkage and closed-form marginal likelihoods. For generalised linear models, the generalised g-prior extends this construction to non-Gaussian likelihoods by matching the prior covariance to the Fisher information at the null: where is the block of the expected Fisher information evaluated at the null model. This formulation allows prior specification that adapts to the scale and geometry of each model, providing an objective starting point for Bayesian model selection and inference (1308.6780, 1503.06913).
Mixtures of g-priors, where g is assigned its own prior, are a standard device for robust variable selection. These mixtures can be unified through the use of the truncated Compound Confluent Hypergeometric (tCCH) distribution on the shrinkage factor , subsuming commonly used priors such as hyper-g, Beta-prime, incomplete inverse-Gamma, intrinsic, and robust priors (1503.06913). By tuning tCCH hyperparameters, practitioners can enforce consistency, invariance, and other desiderata.
2. Bayesian Model Selection, Marginal Likelihoods, and Bayes Factors
A central feature of Bayesian GLMs is the capacity for principled model selection and averaging via marginal likelihoods and Bayes factors. For Gaussian linear models, these are often available in closed form. However, in GLMs, computation is more complex. Several tractable approximations are available:
- Integrated Laplace Approximations: The marginal likelihood of a candidate model is approximated by expanding the log-likelihood at the maximum likelihood estimator (MLE) and using the prior covariance (typically the generalised g-prior). With mixture priors on g, the integrals can often be evaluated analytically, facilitating high-throughput model comparison (1308.6780, 1503.06913).
- Test-based Bayes Factors (TBFs): By appealing to the asymptotic distribution of the deviance statistic (), the marginal density under the alternative model can be expressed analytically, allowing for an approximate Bayes factor:
where is the model dimension and z_j is the deviance, directly available from standard GLM routines (1308.6780).
These strategies allow computationally efficient screening of very large model spaces, supporting variable selection, model averaging, and Bayesian decision-making. When the deviance and Wald statistic are asymptotically equivalent, TBF and data-based Bayes factors are nearly identical (1503.06913).
3. Inference and Posterior Estimation
Posterior inference for GLM coefficients typically exploits either the local normal approximation at the MLE or deterministic approximate inference schemes. Given a fixed g (or after integrating g out), the posterior for the parameters takes the form: where is the shrinkage factor and denotes the MLE. This construction produces shrunken estimates and reduced posterior variance for the regression coefficients, with the intercept typically left unshrunk (1308.6780). Fully Bayesian treatment involves integrating over g, either numerically or via sampling, yielding a mixture of these normal approximations.
Expectation propagation (EP) has emerged as an accurate and computationally efficient alternative to MCMC in GLMs. Recent work has reduced the per-iteration cost of EP from to by exploiting the dependence of each likelihood contribution on the inner product , allowing high-dimensional Bayesian GLMs to be fit in seconds (2407.02128).
Other approximate inference techniques include the use of the Integrated Nested Laplace Approximation (INLA) for latent Gaussian models, which offers deterministic approximations for posterior marginals (1401.2957), and high-dimensional acceleration via low-rank projections of the predictor matrix (LR-GLM), permitting well-characterized trade-offs between computational efficiency and statistical accuracy (1905.07499).
4. Extensions: Mixed Models, Dispersion, and Constrained Parameters
Bayesian GLMs have been extended in several directions:
- Random Effects and GLMMs: Hierarchical and mixed models are widely used for grouped or longitudinal data. For bounded responses (e.g., proportions), beta mixed models can be accommodated via INLA, which allows efficient inference and model comparison even in elaborate scenarios (1401.2957, 1506.04792).
- Handling Overdispersion and Underdispersion: Classical Poisson or binomial models are restricted in their mean-variance relationship. Bayesian GLMs accommodate general dispersion via models such as the negative binomial (Poisson–gamma mixture for overdispersed counts) and the mean-parametrized Conway–Maxwell–Poisson (CMP) for both over- and underdispersion. Straightforward Metropolis–Hastings schemes can be used for posterior sampling (1506.04792, 1910.06008).
- Linear Inequality Constraints: In problems with shape constraints (e.g., monotonicity, convexity), Bayesian GLMs can impose such restrictions using truncations of multivariate normal priors, efficiently sampled via product slice samplers with proven geometric ergodicity, giving valid uncertainty quantification and typically improved efficiency over classical methods (2108.10472).
5. Robustness, Prior Elicitation, and Misspecification
Robust Bayesian GLMs often use quasi-posteriors based on quasi-likelihoods, which only require correct specification of the first two moments. This grants resilience to model misspecification (such as overdispersion or heteroscedasticity) and provides asymptotically normal posterior distributions with correct frequentist coverage. The central loss-scale (or dispersion) parameter can be estimated by the method of moments, ensuring that credible intervals are appropriately calibrated (2311.00820).
The elicitation of informative, interpretable priors remains a practical concern. Structured frameworks translate expert judgments on observable quantities (e.g., scenario-based credible intervals) into multivariate or normal-inverse-gamma priors on GLM coefficients, often with dependencies modelled via canonical vine copulas. Such approaches flexibly accommodate a wide class of GLMs, including those with overdispersion, zero-inflation, or bounded support, and are implemented in applied software (2303.15069).
Recent large-scale simulations have investigated the impact of model misspecification in Bayesian GLMs for bounded or lower-bounded data, showing that standard likelihood-link combinations (e.g. beta with logit link or gamma with log link) are generally robust and that even standard normal-based models can achieve reasonable calibration, though care is necessary in settings where support mismatch is a concern (2311.09081).
6. High-Dimensional and Computational Advances
Modern data scenarios frequently involve high-dimensional covariate spaces, necessitating scalable inference and model selection strategies. Recent advances include:
- Efficient Model Space Exploration: The use of Laplace-based marginal likelihoods and stochastic gradient algorithms allows scalable, efficient Bayesian variable selection over very large model spaces. Subsampling techniques can further accelerate computation while guaranteeing theoretical consistency when combined with MCMC (2201.13198).
- Consistency in High Dimensions: Theoretical work leveraging nonasymptotic quadratic approximations of GLM log-likelihoods has produced near-optimal bounds for maximum likelihood estimation and marginal likelihood accuracy, weakening previously restrictive conditions (such as so-called beta-min conditions) and covering models (e.g. Poisson regression) where classical concentration assumptions do not hold (2408.04359).
- Software Implementations: Several open-source packages (e.g., BAS, eglm, kDGLM, and INLA) implement the methodologies discussed, including high-dimensional model selection and inference routines (1503.06913, 2303.15069, 2403.13069, 1401.2957).
The integration of deep-learning feature extractors (e.g., convolutional neural networks with MC dropout) into Bayesian GLMs further enables modeling of high-dimensional, structured data (such as images or spatially correlated fields) while retaining interpretability and uncertainty quantification (2210.09560).
7. Practical Applications and Real-World Analyses
Bayesian GLMs underpin a broad range of applications in the biomedical sciences, epidemiology, astronomy, and industrial quality control, among others. For example:
- Clinical Prediction Models: Efficient Bayesian model selection using generalised g-priors and test-based Bayes factors has been demonstrated in logistic regression settings for survival analysis, as in the GUSTO-I clinical trial (1308.6780).
- Beta Mixed Models in Quality-of-Life Studies: Hierarchical Bayesian inference for bounded indices in work-life analysis, with sensitivity analyses and INLA computation, provides robust, interpretable inference even under complex random-effects structures (1401.2957).
- Count Data in Astronomy: Bayesian negative binomial GLMs elucidate the dependence of globular cluster populations on galactic properties, incorporating observational errors, heteroscedasticity, and random effects (1506.04792).
- Design of Experiments: Bayesian information capacity optimal designs enable robust variable screening with careful trade-offs in experimental resource allocation across diverse GLM contexts (1601.08088).
- Time Series and Dynamic Modeling: The kDGLM framework delivers fast Bayesian state-space modeling for time series data in epidemiological surveillance and environmental analysis (2403.13069).
These developments, coupled with advances in approximate inference, robust and informative prior specification, and high-dimensional computational methods, have established Bayesian GLMs as a mature, adaptable, and practically valuable class of statistical models for modern applied sciences.