SVGP KANs: Scalable, Uncertainty-Aware Models

Updated 8 December 2025

SVGP KANs are scalable and interpretable machine learning models that blend the additive structure of Kolmogorov-Arnold Networks with Gaussian Process probabilistic inference.
They leverage sparse variational methods and analytic moment matching to efficiently propagate uncertainty and reduce computational complexity in large-scale function regression.
Their edge-wise functional mapping enables post-hoc structure discovery and precise feature importance analysis, facilitating rigorous model interpretability.

Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KANs) are a class of scalable, uncertainty-aware, and interpretable machine learning models that synthesize the additive structure of Kolmogorov-Arnold Networks (KANs) with the probabilistic inference of Gaussian Processes (GPs) realized via sparse variational methods. This architecture is designed for applications that demand both interpretability and rigorous uncertainty quantification, particularly in scientific discovery and large-scale function regression (Ju, 29 Nov 2025, Ju, 4 Dec 2025).

1. Architectural Foundations

SVGP KANs are constructed on the Kolmogorov-Arnold network formalism, which leverages the Kolmogorov–Arnold representation theorem to express any continuous multivariate function as finite compositions and summations of univariate functions. Each KAN layer maps an input vector $x \in \mathbb{R}^{P_\mathrm{in}}$ to $y \in \mathbb{R}^{P_\mathrm{out}}$ by placing independent, learnable univariate functions $\phi_{ji}(\cdot)$ on every directed edge from input coordinate $i$ to output coordinate $j$ :

$y_j = \sum_{i=1}^{P_\mathrm{in}} \phi_{ji}(x_i)$

This edge-wise additive decomposition ensures that every $\phi_{ji}$ is a single-input, single-output mapping, yielding direct interpretability by associating each edge with a unique univariate transformation (Ju, 29 Nov 2025).

Probabilistic inference is incorporated by endowing each edge function $\phi_{ji}$ with a zero-mean GP prior:

$\phi_{ji}(\cdot) \sim \mathcal{GP}(0, k_{ji}(\cdot, \cdot))$

Typically, $k_{ji}$ is an RBF kernel with signal variance $\sigma_f^2$ and length-scale $\ell_{ji}$ . The network assumes mean-field independence of edge functions, resulting in a tractable factorized structure for Bayesian inference (Ju, 4 Dec 2025).

2. Sparse Variational Gaussian Process Formulation

Traditional GP-KANs are limited by the computational cost of exact GP inference, which scales as $\mathcal{O}(N^3)$ per edge, making them infeasible for large datasets. SVGP KANs circumvent this bottleneck by introducing $M$ inducing points per edge and employing sparse variational inference. For each edge $(j,i)$ :

Inducing inputs $Z_{ji} = \{z_{ji,m}\}_{m=1}^M$
Inducing values $u_{ji} = [\phi_{ji}(z_{ji,1}), ..., \phi_{ji}(z_{ji,M})]^T$
Variational posterior $q(u_{ji}) = \mathcal{N}(m_{ji}, S_{ji})$

The variational distribution over all functions $\{\phi_{ji}\}$ factorizes as:

$q(\{\phi_{ji}\}) = \prod_{j,i} \int p(\phi_{ji} \mid u_{ji})\, q(u_{ji})\, du_{ji}$

Training maximizes the evidence lower bound (ELBO):

$\mathcal{L} = \sum_{n=1}^N \mathbb{E}_{q(f(x_n))}[\log p(y_n \mid f(x_n))] - \sum_{j,i} \mathrm{KL}[q(u_{ji}) \parallel p(u_{ji})]$

where the KL-divergence term is amenable to a closed form due to the Gaussianity of both prior and posterior over inducing points (Ju, 29 Nov 2025, Ju, 4 Dec 2025).

3. Analytic Moment Matching and Uncertainty Propagation

A distinctive feature of SVGP KANs is analytic moment matching for propagating uncertainty through deep additive structures. When the input to a univariate GP edge is itself Gaussian distributed, as arises from aggregating uncertainty upstream, the predictive mean and variance can be computed in closed form for the RBF kernel. For $x \sim \mathcal{N}(\mu_x, s_x^2)$ and inducing point $z_m$ :

$\mathbb{E}[k(x, z_m)] = \sigma_f^2 \ell / \sqrt{\ell^2 + s_x^2} \exp\big(-\frac{(z_m-\mu_x)^2}{2(\ell^2 + s_x^2)}\big)$
Higher-order moments can be similarly derived, enabling efficient and exact marginalization over input uncertainties (Ju, 29 Nov 2025, Ju, 4 Dec 2025).

This mechanism supports rigorous propagation of both epistemic (model) and aleatoric (data) uncertainty, differentiating SVGP KANs from deterministic KANs and standard neural architectures.

4. Computational Complexity and Scalability

By using sparse variational inference and mini-batching, SVGP KANs achieve per-epoch computational complexity of $\mathcal{O}(N M^2)$ , where $M$ is the number of inducing points per edge and $N$ is the number of training samples. If $M$ and mini-batch size $B$ are held fixed, the dependence on $N$ is linear, a substantial improvement over the cubic scaling seen in exact GP-KANs. For $P_\mathrm{in} \times P_\mathrm{out}$ edges, the total per-batch computational cost is $\mathcal{O}(P_\mathrm{in} P_\mathrm{out} B M^2)$ . Storage complexity per edge is $\mathcal{O}(M^2)$ for covariance and $\mathcal{O}(M)$ for inducing locations and means (Ju, 29 Nov 2025, Ju, 4 Dec 2025).

5. Training Procedure

SVGP KANs are trained by stochastic optimization of the ELBO using analytic gradients. Key training steps:

Initialization: For each edge, inducing inputs $Z_{ji}$ are selected (e.g., via K-means or deterministic grid), and $m_{ji}$ , $S_{ji}$ are initialized.
Mini-batch processing: For each mini-batch, compute all kernel matrices, aggregate predictive means/variances, and compute the batch ELBO.
Backpropagation: Optimize all parameters, including inducing locations and kernel hyperparameters, potentially with gradient-based optimizers such as Adam.
Utilize batched linear algebra to accelerate operations across all edges and exploit GPU parallelism.

Analytic moment matching obviates the need for Monte Carlo sampling in forward uncertainty propagation, further enhancing scalability (Ju, 29 Nov 2025).

6. Structural Discovery and Model Interpretability

SVGP KANs possess inherent interpretability due to their additive structure and edge-wise functional mapping. They can perform post-hoc structure discovery via permutation-based variable importance:

Shuffle each input feature in a held-out dataset, measure the increase in test MSE, and define feature importance $I_d = \mathrm{MSE}_d - \mathrm{MSE}_{orig}$ .
Edges with importance below a predefined threshold are deemed irrelevant.

Functional relationship classification is facilitated by inspecting the learned kernel length-scales:

Large $\ell_{j,d}$ relative to data range signifies linear or constant behavior.
Small $\ell_{j,d}$ indicates high-frequency or nonlinear relationships.

Visualization of learned edge functions permits categorization into polynomials, periodicities, and other nonlinear behaviors (Ju, 29 Nov 2025).

7. Empirical Validation and Practical Applications

SVGP KANs have been validated across synthetic and real scientific machine learning tasks:

Basic synthetic regression: Precisely recovers additive structure and prunes irrelevant features.
2D surface reconstruction: Demonstrates calibrated epistemic uncertainty outside the training domain.
Friedman #1 benchmark: Achieves strong test RMSE and correctly identifies informative features, suppressing spurious signals.
Heteroscedastic fluid flow reconstruction: Accurately infers spatially varying aleatoric noise fields and achieves coverage aligned with nominal error rates.
Multi-step PDE forecasting: Predictive spread increases with compounded epistemic uncertainty, correlating with physical intuition.
OOD detection in convolutional autoencoders: Predictive variance sharply distinguishes in-distribution from anomalous data with ROC–AUC ~0.8–0.9 (Ju, 29 Nov 2025, Ju, 4 Dec 2025).

SVGP KANs also support separation and quantification of aleatoric vs. epistemic uncertainty. In settings with measurement noise, distinct GPs are used for the latent predictive mean and input-dependent noise variance, ensuring principled uncertainty calibration.

SVGP KANs enable interpretable scientific modeling at scale, blending universal function approximation, Bayesian inference, analytic uncertainty propagation, and variable discovery. Their analytical tractability and computational efficiency position them as a robust alternative to traditional deep learning and GP-based models for scientific machine learning (Ju, 29 Nov 2025, Ju, 4 Dec 2025).