Mixture-of-GAMs: Interpretable Local Regression
- Mixture-of-GAMs framework is a regression model that combines local generalized additive models with kernel approximations and clustering for enhanced interpretability and prediction accuracy.
- The method employs random Fourier features and PCA to approximate kernels and reduce dimensionality, enabling stable Gaussian mixture model clustering and local regime identification.
- Empirical results show significant RMSE improvements over global GAMs on benchmarks like California Housing and NASA Airfoil, demonstrating practical gains in both accuracy and interpretability.
A mixture-of-GAMs framework integrates locally adaptive model structure and kernel-inspired expressivity into inherently interpretable regression. The central construction leverages random Fourier feature (RFF) based embeddings to approximate kernel methods, principal component analysis (PCA) for dimensionality reduction, and Gaussian mixture models (GMM) to discover latent local regimes within data. Within each identified regime, a generalized additive model (GAM) is trained, preserving transparency through univariate spline components. The final regression function combines these local GAMs via soft cluster weights, achieving competitive prediction accuracy and interpretability. This strategy addresses the long-standing challenge of reconciling black-box predictive power with explainable modeling (Huang et al., 22 Dec 2025).
1. Fundamental Structure of the Mixture-of-GAMs Framework
The mixture-of-GAMs estimator is designed for regression scenarios where local data heterogeneity is pronounced, yet model transparency is imperative. The method comprises the following sequence:
- Random Fourier Features: Construct a mapping from to , with inner products approximating a shift-invariant kernel . The RFF embedding is determined by sampling frequencies (the spectral density of ) and phases , using
- Principal Component Analysis: The RFF features are compressed to dimension to stabilize clustering and mitigate the curse of dimensionality. Given centered activations , the principal directions yield low-dimensional representations .
- Gaussian Mixture Model Clustering: The are clustered via a GMM, parameterized by weights , means , and covariances , yielding soft responsibilities (posterior probabilities).
- Cluster-wise GAMs: For each cluster , fit
where each is a smooth univariate spline with roughness penalization.
- Final Prediction: For input , compute the low-dimensional representation and cluster posteriors, and predict via
This framework achieves near-kernel regression accuracy with interpretable additive decomposition within each local regime (Huang et al., 22 Dec 2025).
2. Random Fourier Feature Embedding
Random Fourier features approximate continuous, shift-invariant, positive-definite kernels via explicit feature maps. For , Bochner’s theorem gives
where is the spectral density of . Sampling and for yields a feature mapping . The kernel is approximated as
This allows scalable kernel ridge regression by operating in the -dimensional space, unlike the cost of standard kernel methods.
After fitting the RFF-ridge model , extract activations and prepare for dimensionality reduction via PCA.
3. Dimensionality Reduction via PCA
Given the potentially high-dimensional RFF map, clustering directly in dimensions is unstable. PCA is performed on the centered activations to compute leading directions:
The compressed representations minimize reconstruction error. Each encodes the location of in the compressed RFF latent space. This step is essential for robust downstream clustering.
4. Gaussian Mixture Model Clustering and Cluster Assignment
A Gaussian mixture model with components is fit to :
Model parameters are estimated using the expectation-maximization (EM) algorithm. Soft cluster assignments (responsibilities) are given by
These quantify the affinity of data point to each cluster, facilitating localized modeling in the next stage.
5. Construction of the Mixture-of-GAMs Predictor
Each cluster specifies a GAM:
where is a univariate function represented by a spline basis:
Smoothness is enforced by penalizing the integrated squared second derivative,
with a finite-difference penalty matrix. The final mixture output is
where is the soft cluster assignment for input ’s latent representation.
6. Training Objectives and Optimization Pipeline
The method is trained via a staged optimization pipeline rather than full joint training:
- RFF Ridge Regression: Minimize
- GMM Fitting: Maximize the log-likelihood on ,
- GAM Fitting: For each cluster , assign training points to cluster and fit by minimizing
- Final Prediction Formation: Combine cluster predictions via soft weights to yield . Optional iterative refinement (e.g., updating RFF or GMM on residuals) is possible but not central to the primary study.
A summary pseudocode for the pipeline is provided in the primary reference (Huang et al., 22 Dec 2025).
7. Empirical Performance and Applications
Performance was assessed on regression benchmarks:
- California Housing (N ≈ 20,640, p = 8)
- NASA Airfoil Self-Noise (N = 1,503, p = 5)
- Bike Sharing (N ≈ 17,379, p ≈ 12)
Root-mean-squared error (RMSE) was the primary metric, with the following comparative results:
| Dataset | Global GAM RMSE | Mixture-of-GAMs RMSE | RFF RMSE | MLM RMSE | Notable Findings |
|---|---|---|---|---|---|
| California Housing | ≈ 0.567 | ≈ 0.501 | ≈ 0.44 | ≈ 0.57 | Mixture-of-GAMs outperforms all interpretable baselines |
| NASA Airfoil | ≈ 4.51 dB | ≈ 2.22 dB | ≈ 1.08 dB | - | Substantial (>2×) error reduction over global GAM |
| Bike Sharing | ≈ 88.8 | ≈ 58.2 | - | ≈ 60.9 | Mixture-of-GAMs comparable to mixture-of-linear-models (MLM-cell) |
These results demonstrate that the RFF-driven mixture-of-GAMs framework identifies meaningful local regimes in data and achieves much improved prediction accuracy over classical additive models, while remaining interpretable (Huang et al., 22 Dec 2025). The construction is applicable to real-world regression problems requiring both predictive strength and local interpretability.