Papers
Topics
Authors
Recent
Search
2000 character limit reached

GAMformer: In-Context GAM Estimation

Updated 23 May 2026
  • GAMformer is a transformer-based framework that performs rapid, single-pass estimation of generalized additive models (GAMs) via in-context learning, enabling direct recovery of interpretable shape functions.
  • The method encodes tabular features into quantile-based one-hot tokens and processes them with a dual-module self-attention architecture, bypassing iterative fitting.
  • Empirical evaluations show that GAMformer achieves competitive accuracy against XGBoost and EBMs on synthetic and real-world datasets while maintaining clear visual interpretability.

GAMformer is a transformer-based modeling approach for the rapid estimation of Generalized Additive Models (GAMs) via in-context learning. Eschewing the traditional requirement for iterative model fitting, GAMformer enables single-pass, nonparametric recovery of feature shape functions for tabular data, with design choices that support interpretability and empirical competitiveness on classification benchmarks. The model is trained exclusively on synthetic data yet achieves strong generalization to real-world tabular datasets (Mueller et al., 2024).

1. Generalized Additive Model Framework

A Generalized Additive Model expresses the target as an additive sum of univariate, potentially nonlinear effects: f(x)=j=1dgj(xj)f(x) = \sum_{j=1}^d g_j(x_j) where x=(x1,,xd)x = (x_1, \dots, x_d) denotes the dd-dimensional input vector, and each gj:RRg_j: \mathbb{R} \to \mathbb{R} is a shape function representing the partial effect for feature jj. In supervised learning, the model predicts either a real or categorical response yy, with a link function gg as follows: g(E[yx])=β+j=1dgj(xj)g\bigl(\mathbb{E}[y|x]\bigr) = \beta + \sum_{j=1}^d g_j(x_j) The resulting structure ensures direct interpretability, as plotting gjg_j versus xjx_j visualizes the effect of individual features.

2. In-Context Learning Paradigm

GAMformer fundamentally reframes shape function estimation as an in-context learning problem, leveraging the transformer’s capacity to jointly process labeled train examples and unlabeled test instances in a unified sequence. At inference, a small context set x=(x1,,xd)x = (x_1, \dots, x_d)0 and one or more test points x=(x1,,xd)x = (x_1, \dots, x_d)1 are provided. Feature values are discretized into x=(x1,,xd)x = (x_1, \dots, x_d)2 quantile-based bins, with each continuous or categorical feature mapped to a one-hot encoding. Each token represents a pairing of a feature-bin and the associated label embedding, formed into a x=(x1,,xd)x = (x_1, \dots, x_d)3 grid—which is then augmented by test rows with dummy label tokens. The transformer outputs a tensor x=(x1,,xd)x = (x_1, \dots, x_d)4 encoding shape tables, where x=(x1,,xd)x = (x_1, \dots, x_d)5 is the number of output classes. Predictions are computed by bin index lookup and summation: x=(x1,,xd)x = (x_1, \dots, x_d)6 This design enables immediate, one-shot estimation of main effect functions from the observed context.

3. Model Architecture

GAMformer’s architecture comprises three principal components:

  • Embedding Layer: All features (continuous and categorical) are mapped to 64 bins, each represented by a one-hot vector (x=(x1,,xd)x = (x_1, \dots, x_d)7). These are processed through a small MLP to yield embeddings of dimension x=(x1,,xd)x = (x_1, \dots, x_d)8. Label embeddings are added class-specifically.
  • Transformer Encoder: The core consists of 12 layers, each with dual-module self-attention: one head operates across features within each row (example-wise), while the other attends across examples in each feature column. This permutation-equivariant scheme sidesteps the need for positional encodings. The encoder contains x=(x1,,xd)x = (x_1, \dots, x_d)950.5M parameters.
  • Shape-Function Decoder: For every feature-class pair, embeddings across training examples are aggregated (mean over examples sharing the class label), and a shared MLP decodes these representations into the dd0-length shape vector for each class, yielding the discrete nonparametric dd1.

This enables seamless, main-effect function table estimation and interpretable inference for each test instance.

4. Training Methodology

GAMformer is trained solely on synthetic tabular datasets generated from two distinct priors:

  1. Structural Causal Models (SCMs): Random graph structures with stochastic edge functions.
  2. Gaussian Process (GP) Priors: Random function draws with varying kernels.

Each synthetic data batch is split into context and test subsets. The training objective is cross-entropy loss over held-out test instances: dd2 No curvature regularization is used—smoothness of dd3 is induced by pretraining on smoothly varying synthetic priors. Optimization is performed using SGD or Adam. No per-dataset fitting or hyperparameter tuning is performed at inference.

5. Empirical Evaluation and Results

GAMformer is evaluated on a battery of synthetic and real-world tabular tasks:

  • Toy Examples: On linearly separable and polynomial datasets, GAMformer accurately recovers shape plots (e.g., dd4). Smoother estimates are observed versus EBM.
  • Synthetic and OpenML Benchmarks: Performance matches XGBoost and EBMs on ~30 classification datasets (up to 2,000 rows, 10 features). Second-order interactions close gaps on "XOR"-like patterns. Critical-difference plots confirm statistical parity with EBMs and XGBoost; with second-order terms, parity or slight outperformance is observed.
  • MIMIC-II (ICU Mortality): GAMformer shape functions for features such as Age and PaOdd5/FiOdd6 replicate clinical U-shapes and highlight an imputation artifact around PF ratio dd7 325. Missing-value patients are more sharply isolated versus EBM.
  • Ablations: Robustness holds for dd8 up to 2,000 and dd9 up to 10; main effects show superior sample efficiency at small gj:RRg_j: \mathbb{R} \to \mathbb{R}0. Like all transformers, performance degrades if context size gj:RRg_j: \mathbb{R} \to \mathbb{R}1 greatly exceeds the gj:RRg_j: \mathbb{R} \to \mathbb{R}2 seen in pretraining.

6. Interpretability, Advantages, and Limitations

GAMformer's direct output of discrete, binned shape tables preserves GAM interpretability—each gj:RRg_j: \mathbb{R} \to \mathbb{R}3 has an immediate, visual partial effects plot. No iterative optimization or dataset-specific hyperparameter search is necessary; model inference is strictly a single forward pass. Empirical accuracy matches or exceeds tree and neural boosted baselines, and binned representations naturally accommodate discontinuities, feature artifacts, and missing values.

The approach presents several limitations: inference quadratic in gj:RRg_j: \mathbb{R} \to \mathbb{R}4 and gj:RRg_j: \mathbb{R} \to \mathbb{R}5 limits practical application to small/mid-size tabular datasets; length extrapolation is constrained—performance plateaus or degrades when presented with contexts much longer than those seen during training; only main effects and greedy pairwise feature terms are directly modeled—higher-order interactions incur exponential feature blowup.

A plausible implication is that GAMformer's paradigm opens the possibility for amortized, domain-agnostic tabular estimation, provided computational and interaction modeling constraints are managed.

7. Context and Significance

GAMformer introduces a new direction in the space of interpretable tabular modeling by unifying recent insights from transformer-based in-context learning with longstanding advances in additive, inherently interpretable regression and classification. By leveraging pure synthetic data and an architecture tailored for permutation-equivariant, context-driven inference, GAMformer eliminates traditional fitting loops, achieves strong empirical accuracy, and preserves the core interpretive benefits of GAMs, marking a significant advancement for generalized, single-pass, nonparametric tabular modeling (Mueller et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GAMformer.