Kolmogorov–Arnold Networks for Graph Learning

Updated 24 November 2025

KAGIN are neural architectures extending Kolmogorov–Arnold Networks to graph learning by parameterizing edge and node transformations with learnable univariate functions.
They employ B-spline expansions to implement univariate approximations, overcoming expressivity and scalability limits inherent in traditional MLP-based graph neural networks.
Empirical studies highlight KAGIN’s improved performance in graph regression and node classification, despite increased training overhead and the need for careful hyperparameter tuning.

Kolmogorov–Arnold Networks (KAGIN) are a family of neural architectures that generalize and extend Kolmogorov–Arnold Networks (KANs) to graph learning. KAGIN leverages learnable univariate function parameterizations for edge and node transformations, exploiting the Kolmogorov–Arnold superposition theorem, and aims to overcome expressivity and scalability limitations of standard multilayer perceptron (MLP)–based Graph Neural Networks (GNNs) (Kiamari et al., 10 Jun 2024, Bresson et al., 26 Jun 2024).

1. Theoretical Foundation

The Kolmogorov–Arnold superposition theorem states that any continuous multivariate function $f : [0,1]^d \rightarrow \mathbb{R}$ can be represented as a finite sum of univariate function superpositions: $f(\mathbf{x}) = \sum_{i=1}^{2d+1} \Phi_i \Big( \sum_{j=1}^{d} \phi_{ij}(x_j) \Big)$ where each $\Phi_i, \phi_{ij}$ is univariate and continuous (Bresson et al., 26 Jun 2024). This decomposition circumvents the exponential dependence on input dimensionality that typically afflicts classical universal approximators.

KANs instantiate this principle architecturally by replacing fixed weight matrices found in MLPs with learnable univariate functions on each edge within a feed-forward network. For input dimension $n_l$ and output dimension $n_{l+1}$ at layer $l$ , a KAN layer implements

$H^{(l+1)}_{j} = \sum_{i=1}^{n_l} \phi^{(l)}_{j,i}( H^{(l)}_i )$

where $\phi^{(l)}_{j,i}$ is a univariate function parameterized, in practice, via B-spline or related expansions.

KAGIN layers inject this univariate functional transform into the graph domain, analogously replacing MLPs in GNN message-passing blocks with KAN blocks (Bresson et al., 26 Jun 2024); this generates a suite of architectures termed KAGNNs (for GCN, GIN, and related variants), and most narrowly, KAGIN for the GIN-style sum-aggregation block.

2. KAGIN Network Architecture

KAGIN operationalizes the superposition theorem in message-passing GNNs by replacing the classical MLP feature transform with one or more KAN blocks. In the context of a Graph Isomorphism Network (GIN), the standard update

$\mathbf{h}_v^{(\ell)} = \mathrm{MLP}^{(\ell)} \Big( (1+\epsilon) \mathbf{h}_v^{(\ell-1)} + \sum_{u \in \mathcal{N}(v)} \mathbf{h}_u^{(\ell-1)} \Big)$

is replaced by

$\mathbf{h}_v^{(\ell)} = \mathrm{KAN}^{(\ell)} \Big( (1+\epsilon) \mathbf{h}_v^{(\ell-1)} + \sum_{u \in \mathcal{N}(v)} \mathbf{h}_u^{(\ell-1)} \Big )$

where each component of $\mathbf{h}_v^{(\ell)}$ involves a sum over univariate functions $\phi^{(\ell)}_{ij}$ on the aggregated features, as in the prototype KAN formula. The aggregation function (sum or mean) and choice of structural coefficients (e.g., the “ $\epsilon$ ” in GIN) can be retained or adapted as required (Bresson et al., 26 Jun 2024).

Table: Principal architectural differences in GIN-like GNNs

Layer type	Feature transform	Univariate parameterization
Classic GIN	MLP	None (matrix × nonlinear)
KAGIN	KAN block (superposition)	B-splines (main), RBF possible

In KAGIN, each feature-dimension pair (for input dimension $d$ , output $k$ ) introduces a distinct univariate parameterization; in the spline case, this typically entails $G$ grid points per function and degree- $k$ B-splines. Outer sums are then over these learned univariate maps, with final aggregation at the block output.

3. Function Parameterization and Training

The dominant univariate basis family for KAGIN is B-splines, parameterized via a grid of $G$ knots and polynomial order $k$ : $\phi(x) = \sum_{i=1}^{G} \alpha_i B_{i,k}(x)$ where $B_{i,k}(x)$ are B-spline basis functions and $\alpha_i$ are trainable coefficients. This choice offers local support, smoothness, and efficient evaluation; all control points are optimized during gradient-based training (Bresson et al., 26 Jun 2024). Radial basis function (RBF) parameterizations have also been considered, though empirical focus has largely remained on splines.

Training objectives in KAGIN adopt standard GNN loss formulations:

Node classification: Cross-entropy loss over labeled nodes.
Graph classification: Cross-entropy loss summing over graphs.
Graph regression: Mean absolute error (MAE).

No specialized regularization is imposed, except occasionally L2 penalties on basis coefficients.

4. Expressivity, Scalability, and Theoretical Guarantees

The compositional structure of KAGIN allows it to inherit the universal approximation properties of the Kolmogorov–Arnold theorem for multivariate continuous functions. A distinctive consequence is that approximation error decays as

$\| f - f_{\text{KAN}} \|_{C^m} \leq C \, G^{-(k+1-m)}$

with $C$ independent of input dimension $n$ ; thus, parameter growth is $O(n G)$ for $G$ grid points and $n$ input features, precluding exponential scaling with respect to dimension and mitigating the curse of dimensionality found in classical neural approximators (Basina et al., 15 Nov 2024).

For graph learning, KAGIN’s expressivity manifests in improved regression accuracy (e.g., in QM9, ZINC-12K), and (with appropriate width and grid size) competitive or superior node/graph classification compared to their MLP-based analogs.

5. Empirical Performance and Comparative Analysis

Empirical evaluations on suite benchmarks highlight KAGIN’s strengths primarily in graph regression and, in GIN-like architectures, in node/graph classification tasks (Bresson et al., 26 Jun 2024):

Node classification (Cora, Citeseer, Ogbn-arxiv, etc.): KAGIN outperforms GIN on most datasets, particularly in smaller graphs.
Graph classification (TU datasets): Marginal or comparable improvements over GIN; normalization of features can impact classification accuracy for continuous node features.
Graph regression (QM9, ZINC-12K): KAGIN produces substantially lower MAE than GIN—25% to 36% relative improvement on standard benchmarks.

Training speed is slower per epoch when using larger grids or higher spline order, due to the overhead of spline evaluation, but convergence and model stability are comparable to MLP-based GNNs.

An independent paper (Kiamari et al., 10 Jun 2024) on Cora shows GKAN (notationally equivalent to KAGIN when the KAN block is used in GCN/GIN) achieving higher test accuracy than GCN for equal parameter budgets:

With 100 features, GKAN achieves 61.76% (vs. GCN's 53.5%) and converges in fewer epochs.
Similar findings are reported with 200 features and in ablation studies varying grid size, polynomial degree, and width.

6. Hyperparameter Sensitivity and Limitations

KAGIN models introduce additional hyperparameters beyond standard GNNs:

Grid size $G$ (number of spline intervals)
Spline degree $k$
Hidden width (number of output features per layer)
Regularization strength (optional)

Empirical tuning is required: higher grid size and degree increase flexibility but risk overfitting and slower runtime; moderate grid sizes (e.g., $G=7$ ) and piecewise-linear splines (degree 1) are often optimal for standard datasets (Kiamari et al., 10 Jun 2024). RBF-based KANs offer similar expressivity with only marginal increases in training time and parameter count. KAGIN is more sensitive to input normalization, especially when node features are continuous-valued.

Implementation at scale can be hindered by the lack of mature, optimized libraries for univariate function evaluation on GPU/TPU hardware.

7. Interpretability and Future Directions

KAGIN provides enhanced model interpretability: each learned univariate spline or RBF can be directly visualized and analyzed, offering localized insight into feature transformations absent in dense MLP approaches. This “generalized additive model”–like transparency is a cited advantage in both regression and classification contexts (Bresson et al., 26 Jun 2024).

Current limitations include slower per-epoch compute, more intricate hyperparameter search, and challenges scaling to very large graphs. Future work points toward:

Efficient implementation of spline/RBF kernels
Automated hyperparameter selection
Extensions to other GNN types (e.g., attention-based architectures)
Integration of hierarchical or backpropagation-free variants (e.g., HKAN) for scalable training (Dudek et al., 30 Jan 2025)
Broader benchmarks and industrial-grade graph learning applications

Kolmogorov–Arnold Networks in Graph Isomorphism Networks, or KAGIN, thus represent a principled and empirically validated paradigm shift in GNN design, centered on univariate function superposition and realized via spline or RBF basis expansion. This enhances both theoretical guarantees for high-dimensional approximation and practical expressivity for graph learning across diverse tasks (Kiamari et al., 10 Jun 2024, Bresson et al., 26 Jun 2024).