Generalized Linear Models (GLMs)

Updated 28 February 2026

Generalized Linear Models are a unifying framework for analyzing diverse data types by linking exponential family distributions to a linear predictor via a link function.
They utilize methods such as maximum likelihood, IRLS, and robust extensions like DPD to achieve efficient estimation and mitigate outlier effects.
GLMs are widely applied in fields from neuroscience to actuarial science, offering scalable solutions for non-normal, multivariate, and correlated data.

Generalized Linear Models (GLMs) constitute a broad, unifying framework for modeling the relationship between a set of explanatory variables and non-Gaussian data. GLMs extend classical linear regression by permitting the response variable to be modeled via an exponential family distribution and introducing a link function to connect the mean response to a linear predictor. Their modular construction accommodates diverse data types (binary, count, continuous, categorical), supports efficient likelihood-based estimation, and underpins a wide spectrum of applications in statistics and machine learning. The GLM framework is foundational for modern regression analysis, with extensive developments on estimation, Bayesian modeling, robustness, multivariate extensions, model selection, latent structure, and computational scalability.

1. Mathematical Foundations and Canonical Structure

Let $Y \in \mathcal Y$ denote a response variable and $x \in \mathbb R^p$ the associated covariate vector. A GLM is defined by three key components:

Exponential-family Assumption: The conditional distribution of $Y$ (given $x$ ) has the form

$p(y; \theta, \phi) = \exp \left\{ \frac{y\theta - b(\theta)}{a(\phi)} + c(y, \phi) \right\},$

where $\theta$ is the canonical parameter, $\phi>0$ is the dispersion parameter, $b(\theta)$ is the log-partition function, and $a(\phi), c(y, \phi)$ ensure normalization.

Linear Predictor: Predictors enter via $\eta = x^\top \beta$ , with $\beta \in \mathbb R^p$ the coefficients.
Link Function: A monotonic differentiable function $g(\cdot)$ relates the mean $\mu = E[Y \mid x] = b'(\theta)$ to the linear predictor: $g(\mu) = \eta$ . The canonical link corresponds to $g(\mu) = \theta$ .

These choices yield a log-likelihood (up to additive constants):

$\ell(\beta; y, X, \phi) = \sum_{i=1}^n \frac{y_i (x_i^\top \beta) - b(x_i^\top \beta)}{a(\phi)} + c(y_i, \phi),$

and enable the use of likelihood- or quasi-likelihood-based estimation (Shlens, 2014).

Key identities:

$E[Y] = b'(\theta)$ ,
$\operatorname{Var}(Y) = a(\phi) b''(\theta) \equiv \phi V(\mu)$ , with $V(\mu)$ the variance function (e.g., $V(\mu)=\mu$ for Poisson, $V(\mu)=\mu(1-\mu)$ for binomial, $V(\mu)=\mu^p$ for the Tweedie family) (Bonat et al., 2015).

2. Classical Estimation, Inference, and Robust Alternatives

2.1 Maximum Likelihood and Iteratively Reweighted Least Squares (IRLS)

Parameter estimation in GLMs is traditionally performed via maximum likelihood. The score equations for $\beta$ are:

$\sum_{i=1}^n \frac{y_i - \mu_i}{a(\phi)} x_i = 0,$

with $\mu_i = g^{-1}(x_i^\top \beta)$ . Fisher scoring or Newton–Raphson methods yield the IRLS procedure, which iterates:

$\beta^{(t+1)} = (X^\top W X)^{-1} X^\top W z$

where $W_{ii} = [V(\mu_i) g'(\mu_i)^2]^{-1}$ , and $z_i = x_i^\top \beta^{(t)} + (y_i-\mu_i)g'(\mu_i)$ (Adam et al., 2020, Shlens, 2014). IRLS is convex for canonical links and globally convergent.

2.2 Robust and Quasi-likelihood Extensions

Classical GLM estimators can be susceptible to outliers and model misspecification. Several robustifications exist:

Density Power Divergence (DPD): Minimum DPD estimation introduces a tuning parameter $\alpha \ge 0$ ; the estimating equation downweights outliers via model-powered weights $f(y_i;\beta, \phi)^\alpha$ and modified IRLS (Ghosh et al., 2014). The MDPDE approaches the MLE as $\alpha \to 0$ and achieves bounded influence for $\alpha > 0$ .
Heavy-tailed GLM: Replaces the likelihood kernel with distributions, e.g., log–Pareto–tailed gamma, to obtain modified estimating equations with redescending influence and partial robustness (Gagnon et al., 2023).
Quasi-likelihood and Quasi-posterior: Inference is based on the mean and variance structure only, rather than the full distribution. Bayesian quasi-posterior—using the quasi-likelihood as "loss" in a generalized Bayes update—preserves coverage properties, is robust to misspecified higher moments, and is computationally practical (Agnoletto et al., 2023).

2.3 Variational Inequality Estimation

An alternative to direct likelihood optimization is to frame estimation as a variational inequality (VI) problem:

$\hat \theta \in \{\theta: \langle V_N(\theta), \theta - \hat \theta \rangle \ge 0,\ \forall \theta \},$

with $V_N(\theta) = \frac{1}{N} \sum_{i=1}^N (g^{-1}(x_i^\top \theta) - y_i)x_i$ . VI estimation covers canonical and non-canonical, non-smooth, or even non-monotone link functions, providing stable and broadly convergent optimization properties with theoretically established error bounds (Zhu et al., 5 Nov 2025).

3. Bayesian Approaches and Model Selection

GLM Bayesian inference involves placing priors over $\beta$ and (optionally) over dispersion or hyperparameters:

Mixtures of g-priors: For variable selection/model averaging, mixtures of Zellner's $g$ -priors are extended to GLMs using the truncated Compound Confluent Hypergeometric (tCCH) distribution on $u=1/(1+g)$ , encompassing various prior families such as hyper-g, Beta-prime, and intrinsic priors. Analytical closed-form marginal likelihoods (CHIC criterion) enable scalable Bayesian computation and consistent model selection when $g=O(n)$ (Li et al., 2015).
Expectation Propagation (EP): For large-scale Bayesian inference in GLMs with high-dimensional covariates, scalable EP approximates the posterior with Gaussian site approximations. Recent advances yield $O(p n \min\{p, n\})$ per-iteration complexity and maintain predictive and marginal likelihood accuracy for binary, Poisson, and Gamma GLMs (Anceschi et al., 2024).
Unified Bayesian Inference Frameworks: Algorithms derived from standard linear model (SLM) inference (e.g., AMP, VAMP, SBL) are extended to GLMs via message passing/“turbo”-like structures, iteratively alternating linear and nonlinear updates. This modularity clarifies relationships between GAMP, Gr-VAMP, Gr-SBL, and enables new Bayesian GLM algorithms (Meng et al., 2017).

4. Structured, Multivariate, and Latent-Process GLMs

4.1 Multivariate and Correlated-Data Extensions

Multivariate Covariance GLMs (McGLMs): These models generalize GLMs to handle
- Multivariate responses, including mixed data types.
- Arbitrary second-moment correlation structures: temporal, spatial, spatio-temporal, and repeated measures.
- Covariance is modeled via a link function $h$ and matrix linear predictor $\sum_{d=0}^D \tau_d Z_d$ , with $Z_d$ encoding known dependence structure. Estimation proceeds by Newton-scoring via quasi-score and Pearson estimating functions, relying only on second moments and avoiding high-dimensional likelihoods (Bonat et al., 2015).

Aspect	Univariate GLM	McGLM Generalization
Mean structure	$g(\mu) = X\beta$	Margins: $g_r(\mu_r)=X_r\beta_r$
Covariance	$\mathrm{Var}(Y) = \phi V(\mu)$	Covariance link: $h(\Omega)=\sum \tau_d Z_d$
Multivariate	Not supported	$Y\in\mathbb R^{N\times R}$ , cross-covariance via GKP

Latent Process GLMs: For time-series or spatial data, latent processes $\{\nu_t\}$ multiplicatively augment the mean ( $Y_t|\nu_t \sim EF(\mu_t \nu_t, \phi)$ ), accommodating over-dispersion and serial dependence. Estimation via "naïve" GLM pseudo-likelihood yields consistent estimators, but standard errors and prediction are corrected using the true (latent-augmented) information matrix and method-of-moments estimation for $\phi$ and latent process parameters (Barreto-Souza et al., 18 Feb 2026).

4.2 Additive, Nonlinear, and Interaction Extensions

Generalized Unrestricted Models (GUMs): Extending GLMs, GUMs allow arbitrary linear or nonlinear functions of each regressor and permit multilinear interactions. Bayesian GUMs employ Gaussian Process priors for the nonlinear function components, with Laplace or sparse variational inference for scalability. GUMs capture nonlinear and combinatorial structure in data not accommodated by standard GLMs (Adam et al., 2020).
Sparse High-dimensional GLMs: In binary classification with high-dimensional, sparse true parameter vectors, iterative hard thresholding (BIHT) can efficiently recover $\theta^*$ without knowledge of the link function and achieves statistically optimal sample complexity (up to logarithmic factors) in both logistic and probit regimes. BIHT is robust, scalable, and universal for a broad class of GLMs with monotonic links (Matsumoto et al., 25 Feb 2025).

5. Applications, Goodness-of-fit, and Practical Diagnostics

GLMs are widely used for neural spike train modeling (Poisson or Bernoulli responses) in neuroscience (Shlens, 2014), actuarial science (Gamma and heavy-tailed variants) (Gagnon et al., 2023), behavioral science, and generalized regression problems (Adam et al., 2020). Key practical aspects include:

Goodness-of-fit Procedures: Standard and surrogate-based time-rescaling, thinning, and complementing tests enable comprehensive model adequacy checks for point-process and discrete-time GLMs, each probing distinct model features and deficiencies (Gerhard et al., 2010).
Robustness: Outlier-resistant estimators (e.g., minimum DPD, heavy-tailed GLMs) yield bounded influence and enhanced protection against contamination, outperforming classical GLM estimators in both frequentist efficiency and Bayesian calibration (Ghosh et al., 2014, Gagnon et al., 2023).
Forecasting with Latent Structures: Multiplicative process GLMs provide principled predictions by incorporating the conditional expectation of the latent effect, yielding improved RMSE and better calibrated uncertainty (Barreto-Souza et al., 18 Feb 2026).

6. Challenges, Extensions, and Ongoing Research Directions

Model Selection and Regularization: Advances in mixture $g$ -priors and closed-form marginal likelihood computations (CHIC) enable fully Bayesian model comparison and selection scalable to high dimensions (Li et al., 2015).
Scalable Bayesian Computation: Deterministic inference (EP, fast variational Bayes) and message passing algorithms scale Bayesian GLMs to tens of thousands of predictors or samples without sacrificing predictive accuracy (Anceschi et al., 2024, Meng et al., 2017).
Non-standard Link Functions and Optimization: Variational inequality (VI) frameworks provide global convergence and statistical guarantees for GLMs with non-canonical, non-smooth, or non-monotone links, extending the applicability of GLMs to more complex or engineered relationships (Zhu et al., 5 Nov 2025).
Multivariate, Mixed, and Dependent Data: Structured covariance modeling, latent processes, and McGLMs enable statistically coherent analysis of non-normal, multivariate, temporally/spatially correlated data across scientific domains (Bonat et al., 2015, Barreto-Souza et al., 18 Feb 2026).

Ongoing research aims to further develop state-evolution theory for high-dimensional Bayesian GLM algorithms, extend robust Bayesian and frequentist GLM theory, optimize computational methods for massive and structured data, and generalize the GLM framework to accommodate modern challenges in multi-response, temporal, spatial, and over-dispersed data (Meng et al., 2017, Anceschi et al., 2024, Adam et al., 2020).