Papers
Topics
Authors
Recent
2000 character limit reached

Kolmogorov–Arnold Networks (KANs)

Updated 9 November 2025
  • KANs are neural architectures based on the Kolmogorov–Arnold theorem that replace scalar weights with trainable univariate spline functions, enhancing expressivity.
  • They enable detailed symbolic interpretability by allowing direct analysis of each learned spline function, a key advantage in security-critical applications.
  • Empirical studies in IoT intrusion detection show that KANs deliver high accuracy with full features while incurring significant computational overhead during training.

Kolmogorov–Arnold Networks (KANs) are a neural-architecture paradigm inspired by the Kolmogorov–Arnold representation theorem. Their defining feature is the replacement of scalar edge weights common to multilayer perceptrons (MLPs) with learnable univariate activation functions, typically parametrized as splines. This construction enables KANs to achieve a higher degree of expressive power and interpretability, at the cost of greater computational overhead. Recent empirical work demonstrates the practical utility of KANs in complex domains such as IoT intrusion detection, where they combine accuracy with symbolic transparency (Emelianova et al., 7 Aug 2025).

1. Mathematical and Theoretical Foundation

The architecture of KANs is grounded in the Kolmogorov–Arnold theorem, which states that any continuous multivariate function f:[0,1]nRf : [0,1]^n \to \mathbb{R} can be decomposed as

f(x1,,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q\Bigl(\sum_{p=1}^n \phi_{q,p}(x_p)\Bigr)

with ϕq,p\phi_{q,p} and Φq\Phi_q being univariate continuous functions. In a neural context, this suggests that complex high-dimensional mappings can be constructed via a sequence of additive and compositional operations on univariate nonlinear projections.

KANs instantiate this principle by associating with each directed edge from neuron ii in layer \ell to neuron jj in layer +1\ell+1 a trainable univariate function σij:RR\sigma_{ij}: \mathbb{R} \to \mathbb{R}. Practically, each such function is encoded as a B-spline of order kk:

σij(t)=r=1kcij,rBr(t),\sigma_{ij}(t) = \sum_{r=1}^k c_{ij,r} B_r(t),

where {Br}\{B_r\} are fixed spline basis functions and the coefficients cij,rc_{ij,r} are the trainable parameters. The standard neural weighting is recovered as a degenerate case when σij\sigma_{ij} is linear.

From an architectural standpoint, each layer performs an edge-wise transformation followed by summation at each node:

uj(+1)=i=1dσij(xi()),u_j^{(\ell+1)} = \sum_{i=1}^{d_\ell} \sigma_{ij}(x_i^{(\ell)}),

where x()x^{(\ell)} is the input at layer \ell. The nonlinearity and 'weighting' are therefore fused, enabling far richer local input transformations.

2. Architecture and Implementation in IoT Threat Detection

The cited work implements a three-layer KAN ("MultiKAN") for intrusion detection in IoT networks:

  • Input layer: d=47d=47 features (reduced to d=10d=10 after feature selection)
  • Hidden layer 1: $16$ neurons, each receiving dd spline-parameterized edge functions plus bias
  • Hidden layer 2: $8$ neurons, same parametrization
  • Output layer: $2$ units (benign vs. malicious), with final softmax nonlinearity

Each edge carries a compact cubic B-spline (order k=4k=4, though the precise kk is not specified), with a fixed number of knots constituting the local basis for each univariate function. The entire inference chain composes into an analytic, interpretable function traversing the network.

Training procedure:

  • Features are standardized (zero mean, unit variance) per column
  • Feature selection optionally uses Random Forest importance, reducing dd to a top-10 set ("T10" experiment)
  • The optimization minimizes cross-entropy loss with Adam (learning rate 0.001), batch size 128, over 20 epochs; the full training set uses N734,002N \sim 734,002 samples, for 114,680\sim 114,680 total updates

3. Expressivity, Regularization, and Interpretability

Expressivity:

Standard MLPs rely on the composition of fixed nonlinearities (e.g., ReLU, tanh) and affine weights, with increased depth required to fit highly nonstandard or oscillatory target functions. KANs, by contrast, elevate each weight to a learnable nonlinear function, allowing for much finer adaptive modeling—especially of complex, locally structured data.

Interpretability:

A KAN's model can be written entirely in terms of explicit closed-form piecewise polynomials. After training, every σij\sigma_{ij} is an analytic expression, and the composite network function can be directly inspected:

  • Symbolic analysis is possible; in the cited study, the final classifier collapse yields a two-term symbolic formula involving product, offset, and trigonometric structure (see equations (4.1), (4.2) in (Emelianova et al., 7 Aug 2025)).
  • Each spline can be visualized (and in the study, shown as edge thickness or opacity), providing transparency into how input features propagate and interact nonlinearly.
  • This analytic auditability is not present in tree ensembles (RF, XGBoost) or deep MLPs, which require external explainability tools (SHAP, LIME).

4. Empirical Performance on IoT Intrusion Detection

The CIC-IoT-2023 dataset serves as benchmark, evaluated with both the full original feature set (46 features) and reduced ("T10") subsets. Key findings:

Model Malicious F1 (Full) Malicious F1 (T10) Train Time Pred Time
KAN (Full) 0.99 0.54 ~7 h (26720 s) ~3.2 s
RF (Full) 0.94 0.94 ~24 s <1 s
XGB (Full) 0.93 0.94 ~3 s <1 s
KAN (T10) 0.54 0.54 ~14 min (864 s) 0.73 s

Observations:

  • For binary classification (malicious/benign), KAN achieves class 0 (malicious) F1=0.99 (Full), vastly outperforming standard MLP and logistic regression baselines.
  • On the reduced feature set (T10), KAN's recall drops sharply (to 0.48), suggesting feature selection affects its sensitivity. RF and XGB maintain high F1 scores even with fewer features.
  • Computational cost is substantial: KAN requires orders of magnitude more compute time (on CPU, ~7 h training) than tree ensembles (24 s for RF, 3 s for XGB).
  • Prediction latency is moderately higher for KAN, but not prohibitive in batch settings.

5. Interpretability and Deployment Implications

KANs' core advantage is intrinsic interpretability suitable for security-critical or regulated IoT environments:

  • Every learned spline is directly analyzable and modifiable to reflect domain expertise (e.g., periodic patterns in network flows can be enforced or tuned).
  • Alerts or decisions can be traced to formula components, allowing for human auditing and forensic analysis.
  • Spline coefficients may be hand-tuned to incorporate prior knowledge and reduce overfitting to spurious correlations detected in data.

This inherent transparency distinguishes KANs from black-box MLPs and tree ensembles, providing a rigorous pathway for regulated AI deployments where auditability and compliance are non-negotiable.

6. Limitations and Directions for Hybrid and Efficient KAN Deployment

Despite accuracy and interpretability, KANs in this context trade off efficiency:

  • Training is slow due to the overhead of evaluating and differentiating numerous spline parameters; the cited study required ~7 hours (CPU) versus seconds or minutes for baselines.
  • Performance (in raw accuracy and F1) on the preferred class is competitive, but training efficiency and scalability in large data settings remain limiting factors.
  • Tree-based models (RF, XGB) maintain superior overall speed and maintain high accuracy even under aggressive feature selection.

Suggested future directions include:

  • Hybrid pipelines: leveraging RF/XGB for initial filtering or fast inference, with KANs used for detailed, post-hoc explanation of selected instances.
  • Hardware and algorithmic optimizations: GPU-accelerated spline evaluation and training, or more efficient spline parametrizations.
  • Research into domain-specific regularization and architectural modifications to further compress edge function representation without loss of interpretability.

KANs thus offer a rigorously grounded, transparent modeling framework, excelling in domains where detailed functional understanding is prioritized over maximal raw throughput. As an alternative or complement to established models, their impact grows where the combination of symbolic auditability and competitive modeling fidelity is decisive (Emelianova et al., 7 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Kolmogorov--Arnold Networks (KANs).