FlowKANet: Graph Neural Delay Predictor

Updated 31 December 2025

The paper introduces a framework that replaces traditional MLPs with spline-based KAN layers, achieving a 5× reduction in trainable parameters compared to baseline GNNs.
FlowKANet employs iterative message passing with KAMP-Attn, using bidirectional edge processing and attention mechanisms to refine node representations in bipartite graphs.
The approach enables the derivation of symbolic surrogate models, offering interpretable analytical predictors suitable for real-time, resource-constrained network delay forecasting.

FlowKANet is a graph neural network framework for delay prediction in communication networks that incorporates Kolmogorov-Arnold Networks (KAN) at all neural layers, enabling robust nonlinear modeling with substantial gains in parameter efficiency and interpretability. By replacing standard multi-layer perceptrons (MLPs) within heterogeneous message-passing blocks with spline-based KAN modules, FlowKANet maintains the graph-structured inductive bias and attention mechanisms of classical GNNs, while facilitating the derivation of symbolic surrogates for lightweight deployment and transparent inference (Marouani et al., 24 Dec 2025).

1. Architecture and Computational Graph

FlowKANet models communication networks as bipartite, heterogeneous graphs: $\mathcal G=(\mathcal V_f \cup \mathcal V_\ell, \mathcal E)$ with $\mathcal V_f$ denoting flow nodes, $\mathcal V_\ell$ link nodes, and $\mathcal E$ the set of directed edges encoding the bidirectional relationships. The architecture performs $K$ rounds of message passing:

Initial node features $x_f$ (flows) and $x_\ell$ (links) are encoded via KAN blocks into respective embeddings $\mathbf h_f^{(0)}$ and $\mathbf h_\ell^{(0)}$ .
At each layer $k$ , directed edges $(u \to v)$ utilize two shared KAN operators: a transformation $\mathcal T^{\mathrm{KAN}_{u\to v}}$ (maps sender’s embedding to a candidate message) and an attention operator $\mathcal A^{\mathrm{KAN}_{u\to v}}$ (produces normalized attention weights).
Flow→link and link→flow edges are processed iteratively, updating node representations via residual aggregation: $\mathbf h_v^{(k+1)} = \mathbf h_v^{(k)} + \sum_{u\in\mathcal N(v)} \alpha_{uv}^{(k)}\,\tilde{\mathbf h}_{uv}^{(k)}$
After all $K$ layers, each flow node $f$ aggregates its final link context and fuses it (via a KAN block) before passing through the KAN readout for delay prediction $\hat d_f$ .

This design yields a 5× reduction in trainable parameters compared to baseline GNN architectures, primarily by substituting dense MLPs with parameter-efficient spline-based KAN layers (Marouani et al., 24 Dec 2025).

2. Kolmogorov-Arnold Network Layer Formalism

KAN layers leverage the Kolmogorov-Arnold superposition theorem, admitting universal approximation of all continuous $n$ -variate functions: $f(x_1,\dots,x_n) = \sum_{q=0}^{2n}\; \phi_q\left(\sum_{p=1}^n \psi_{q,p}(x_p)\right)$ Here, $\psi_{q,p}$ and $\phi_q$ are learnable univariate splines discretized by their values at $G$ grid points, interpolated via B-splines of order $k$ . For $m$ -dimensional outputs: $\mathbf y = W\left[u_0, u_1, \ldots, u_{2n}\right]^\top$ with $u_q(\mathbf x) = \phi_q(z_q(\mathbf x))$ , $z_q(\mathbf x) = \sum_{p=1}^n \psi_{q,p}(x_p)$ , and $W$ a linear mixing layer. This structure enforces global sparsity and compositionality while retaining full representational power (Marouani et al., 24 Dec 2025).

3. Message Passing and Attention via KAMP-Attn

Central to FlowKANet is KAMP-Attn (Kolmogorov-Arnold Message Passing with Attention). For each edge $(u \to v)$ :

The transformation $\mathcal T^{\mathrm{KAN}_{u\to v}}:\mathbb R^{d_u}\rightarrow\mathbb R^{d_v}$ yields message features $\tilde{\mathbf h}_{uv}^{(k)}$ .
The attention $\mathcal A^{\mathrm{KAN}_{u\to v}}: \mathbb R^{d_v}\rightarrow\mathbb R$ computes scalar scores after LeakyReLU activation: $s_{uv}^{(k)} = \mathcal A^{\mathrm{KAN}_{u\to v}}\left(\mathrm{LeakyReLU}(\mathbf h_v^{(k)}+\tilde{\mathbf h}_{uv}^{(k)})\right)$ Scores are softmax-normalized within the recipient’s neighborhood: $\alpha_{uv}^{(k)} = \frac{\exp(s_{uv}^{(k)})}{\sum_{w\in\mathcal N(v)}\exp(s_{wv}^{(k)})}$
Node $v$ ’s embedding is updated as: $\mathbf h_v^{(k+1)} = \mathbf h_v^{(k)} + \sum_{u\in\mathcal N(v)}\alpha_{uv}^{(k)}\tilde{\mathbf h}_{uv}^{(k)}$

All hyperparameters—grid size $G$ , order $k$ , scale $\sigma$ —are block-wise optimized via Optuna, ensuring uniform spline expressivity across all message-passing layers (Marouani et al., 24 Dec 2025).

4. Model Complexity and Scalability Analysis

FlowKANet demonstrates substantial parameter compression and computational gains:

Model	Trainable Parameters	Per-forward Complexity
Baseline GNN	98,210	$O\left(K\|\mathcal E\|d_h^2\right)$
FlowKANet	20,094	$O\left(K\|\mathcal E\|G d_h\right)$
Symbolic Surrogate	267 constants	$O\left(\|\mathcal E\|\right)$

$K=3$ (layers), $d_h$ =hidden dimension (flows: 8, links: 2), and $G\approx 5$ –$10$ (KAN grid size). KAN layers incur only a linear cost in grid size ( $G$ ), while MLP-based GNNs scale quadratically in $d_h$ . Symbolic surrogates, distilled from trained FlowKANet weights, reduce inference to a minimal set of arithmetic operations (Marouani et al., 24 Dec 2025).

5. Training Procedures and Hyperparameter Optimization

Experiments utilize the GNNet Challenge dataset: 4,389 graphs (train: 3,511; test: 878).

Loss function: mean squared error (MSE), $\mathcal L=\frac{1}{N}\sum_f(\hat d_f - d_f)^2$
Optimizer: Adam, learning rate $2\times10^{-3}$ , dropout $0.1$ between KAN blocks, no weight decay.
Early stopping: based on validation MSE (10% split), patience $=20$ epochs, maximum $=150$ epochs.
Optuna (TPE sampler): hyperparameter sweep over hidden dimensions $\{8,16,32\}$ , layers $K\in\{2,3,4\}$ , KAN grid $G\in[5,10]$ , spline order $k\in[3,5]$ , scale $\sigma\in[0.3,2.5]$ , activation placements.

Convergence typically occurs in 60–80 epochs for both baseline and FlowKANet. Symbolic surrogate distillation, via block-wise regression (250 trials per block), completes within ~2 hours (Marouani et al., 24 Dec 2025).

6. Empirical Performance and Predictive Accuracy

On the GNNet test set (13,704 flows): $\begin{array}{l|c|c|c} \text{Model} & \mathrm{MSE}\ (\downarrow) & \mathrm{RMSE}\ (\downarrow) & R^2\ (\uparrow) \ \hline \text{Baseline GNN} & 38.6358 & 6.214 & 0.8113 \ \text{FlowKANet} & 40.8094 & 6.388 & 0.8727 \ \text{Symbolic Surrogate} & 54.8562 & 7.407 & 0.8290 \end{array}$ FlowKANet preserves explained variance while incurring only minor RMSE degradation compared to standard GNNs. The surrogate model, with fully analytical expressions, sacrifices some predictive accuracy for maximal transparency (Marouani et al., 24 Dec 2025).

7. Symbolic Surrogate Distillation and Analytical Predictors

FlowKANet models are distilled to symbolic surrogates using PySR via block-wise symbolic regression.

Blocks are sequentially replaced with closed-form polynomial/rational functions, trained against frozen upstream activations.
The final predictor maintains the graph-structured dependencies: $\hat d_f = P(S_f(x_f), \sum_{\ell\in\mathcal N(f)}L_\ell)$ where $S_f(x_f)$ is a quadratic function of flow features, $P(\cdot,\cdot)$ is a log-linear combination of $S_f(x_f)$ and aggregated link loads, parameterized by 267 discovered constants. Importantly, all GNN-style neighborhood summations are retained, embedding the original inductive bias within the analytical form.

The full surrogate achieves $O(|\mathcal E|)$ inference complexity, suitable for microsecond-scale execution on resource-constrained hardware (Marouani et al., 24 Dec 2025).

8. Practical Considerations, Trade-offs, and Deployment

Three modeling tiers balance accuracy, efficiency, and interpretability:

Baseline GNN: Maximizes accuracy ( $\mathrm{RMSE}=6.21$ ), but high memory/compute requirements and opaque internals.
FlowKANet: Retains competitive accuracy ( $\mathrm{RMSE}=6.39$ ), offers spline-based function transparency, and reduces parameter footprint by $\sim$ 5×.
Symbolic Surrogate: Slightly degraded accuracy ( $\mathrm{RMSE}=7.41$ ), minimal storage (267 constants), fully interpretable and suitable for real-time inference.

Deployment options span real-time network management on edge devices (symbolic surrogate), adaptive traffic engineering at the edge (FlowKANet), and maximal-accuracy settings in data centers (baseline GNN). The unified graph-message-passing backbone ensures consistent graph inductive bias and seamless transition across tiers (Marouani et al., 24 Dec 2025).

FlowKANet, by synergizing KAN universal approximation with GNN message passing and by enabling symbolic regression, establishes an efficient, interpretable, and highly adaptive framework for network delay prediction. The approach connects modern graph learning to the classical theory of function superposition and demonstrates broad applicability across resource-constrained and critical control contexts.

Markdown Report Issue Upgrade to Chat

References (1)

From GNNs to Symbolic Surrogates via Kolmogorov-Arnold Networks for Delay Prediction (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FlowKANet.

FlowKANet: Graph Neural Delay Predictor

1. Architecture and Computational Graph

2. Kolmogorov-Arnold Network Layer Formalism

3. Message Passing and Attention via KAMP-Attn

4. Model Complexity and Scalability Analysis

5. Training Procedures and Hyperparameter Optimization

6. Empirical Performance and Predictive Accuracy

7. Symbolic Surrogate Distillation and Analytical Predictors

8. Practical Considerations, Trade-offs, and Deployment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FlowKANet: Graph Neural Delay Predictor

1. Architecture and Computational Graph

2. Kolmogorov-Arnold Network Layer Formalism

3. Message Passing and Attention via KAMP-Attn

4. Model Complexity and Scalability Analysis

5. Training Procedures and Hyperparameter Optimization

6. Empirical Performance and Predictive Accuracy

7. Symbolic Surrogate Distillation and Analytical Predictors

8. Practical Considerations, Trade-offs, and Deployment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research