Kolmogorov-Arnold Representations

Updated 3 February 2026

Kolmogorov-Arnold representations are a framework that decomposes continuous multivariate functions into finite compositions of univariate functions, simplifying high-dimensional approximation.
They underpin Kolmogorov-Arnold Networks (KANs) by using learnable univariate modules to achieve enhanced expressivity, parameter efficiency, and interpretability in complex tasks.
Applications span system identification, symbolic regression, and neural scaling, demonstrating robust performance and scalable error rates in various scientific and engineering fields.

The Kolmogorov-Arnold representation theorem establishes that any multivariate continuous function can be constructed as a finite superposition of univariate functions and summations, an insight that has had profound theoretical and practical impact on approximation theory and neural network design. This principle underlies a wide array of modern architectures—chiefly Kolmogorov-Arnold Networks (KANs)—which leverage learnable univariate function modules to achieve expressivity, parameter efficiency, and interpretability across diverse high-dimensional learning tasks.

1. Formal Statement of the Kolmogorov-Arnold Theorem

Let $f : [0,1]^n \to \mathbb{R}$ be an arbitrary continuous function. The Kolmogorov-Arnold theorem asserts the existence of continuous univariate “inner” functions $\psi_{q,p} : [0,1] \to \mathbb{R}$ and continuous “outer” functions $g_q : \mathbb{R} \to \mathbb{R}$ , indexed $q=0,\ldots,2n$ , $p=1,\ldots,n$ , such that

$f(x_1, \dots, x_n) = \sum_{q=0}^{2n} g_q\left( \sum_{p=1}^n \psi_{q,p}(x_p) \right)$

This decomposition simplifies multivariate function construction to a sum over $2n+1$ compositions of $\mathbb{R} \to \mathbb{R}$ maps. The classical proofs by Kolmogorov (1957) and Arnold (1958) rely on topological encoding and constructive digit-interleaving. Notably, the resulting univariate functions in the original construction may lack smoothness and can exhibit fractal or pathological structure, even when $f$ itself is highly regular (Liu et al., 2024, Schmidt-Hieber, 2020, Toscano et al., 2024).

2. Structural Properties and Interpretation

The Kolmogorov-Arnold superposition is characterized by:

Inner mappings: $\psi_{q,p}$ , continuous and typically not smooth in the classical construction, map each input coordinate independently.
Outer mappings: $\psi_{q,p} : [0,1] \to \mathbb{R}$ 0, also only guaranteed to be continuous, recombine summed inner outputs.
Width and depth: The construction is fundamentally a width- $\psi_{q,p} : [0,1] \to \mathbb{R}$ 1, depth-2 architecture for scalar-valued functions; vector-valued outputs require independent applications per component (Liu et al., 2024, Basina et al., 2024).

The representation evinces built-in permutation asymmetry—each summand can use a different set of inner and outer functions, and the order matters. For specific constructions, variants with more refined regularity (e.g., $\psi_{q,p} : [0,1] \to \mathbb{R}$ 2 smoothness) for $\psi_{q,p} : [0,1] \to \mathbb{R}$ 3 and $\psi_{q,p} : [0,1] \to \mathbb{R}$ 4 are possible under additional hypotheses on $\psi_{q,p} : [0,1] \to \mathbb{R}$ 5 (Basina et al., 2024).

3. Algorithmic Realization and Network Architectures

3.1. Basic Kolmogorov-Arnold Network (KAN)

A KAN explicitly operationalizes the theorem via learnable univariate activation functions on edges rather than fixed activations at nodes. In a canonical 2-layer KAN with $\psi_{q,p} : [0,1] \to \mathbb{R}$ 6 inputs, $\psi_{q,p} : [0,1] \to \mathbb{R}$ 7 hidden units, and a scalar output, the computation is

$\psi_{q,p} : [0,1] \to \mathbb{R}$ 8

where $\psi_{q,p} : [0,1] \to \mathbb{R}$ 9 and $g_q : \mathbb{R} \to \mathbb{R}$ 0 are parameterized as smooth (e.g., cubic B-spline) interpolants with a small set of coefficients per function (Liu et al., 2024, Somvanshi et al., 2024, Basina et al., 2024, Gashi et al., 12 Jun 2025). Each “edge” from input $g_q : \mathbb{R} \to \mathbb{R}$ 1 to hidden node $g_q : \mathbb{R} \to \mathbb{R}$ 2 carries a separate learnable nonlinearity; hidden nodes sum their edge activations, and the output aggregates via learned $g_q : \mathbb{R} \to \mathbb{R}$ 3.

3.2. Deep, Variational, and Hybrid Architectures

KANs generalize to deep settings by stacking layers of univariate-function matrices, with each layer implementing a nonlinear edge-wise transformation followed by linear summation. Parametric enhancements include:

Variational KANs: The number of basis elements per univariate function is treated as a latent stochastic variable, optimized jointly with weights using variational inference for adaptive complexity (Alesiani et al., 3 Jul 2025).
KKANs: Kurkova-Kolmogorov-Arnold Networks replace univariate edges by small MLPs and outer maps by linear combinations of basis functions, leveraging both universal approximation and geometric complexity regularization (Toscano et al., 2024).
P-KANs and KAFs: Projective KANs use entropy-driven projection into sparse functional spaces (e.g., Fourier, Chebyshev), while Kolmogorov-Arnold-Fourier Networks tightly integrate random Fourier feature maps to optimize for high-frequency spectral representation and parameter efficiency (Poole et al., 24 Sep 2025, Zhang et al., 9 Feb 2025).

4. Approximation Theory and Scaling Laws

The Kolmogorov-Arnold superposition breaks the curse of dimensionality (COD) for a broad range of function classes:

Dimension-independent error rates: For smooth $g_q : \mathbb{R} \to \mathbb{R}$ 4 with a compositional KAN structure, the spline approximation error in the $g_q : \mathbb{R} \to \mathbb{R}$ 5 norm behaves as

$g_q : \mathbb{R} \to \mathbb{R}$ 6

for cubic splines ( $g_q : \mathbb{R} \to \mathbb{R}$ 7), where $g_q : \mathbb{R} \to \mathbb{R}$ 8 is spline grid resolution and $g_q : \mathbb{R} \to \mathbb{R}$ 9 depends on the regularity of $q=0,\ldots,2n$ 0 but is independent of dimension $q=0,\ldots,2n$ 1 (Liu et al., 2024, Basina et al., 2024).

Neural scaling: Test RMSE scales as $q=0,\ldots,2n$ 2 with $q=0,\ldots,2n$ 3 in parameter-rich KANs with cubic splines, outperforming ReLU MLPs that typically achieve $q=0,\ldots,2n$ 4 (Liu et al., 2024).

This capacity results from reducing a multivariate approximation problem to the superposition of iterated one-dimensional approximation subproblems, making network size and computational demand largely decoupled from input dimension.

5. Regularization, Interpretability, and Symbolization

KANs natively afford sparse and interpretable function representations due to their edge-centric structure:

Sparsity and entropy penalties: Regularization objectives include $q=0,\ldots,2n$ 5 sparsity on spline coefficients and entropy-based penalties to favor flat or simple edge functions, improving robustness and human interpretability (Fronzi et al., 3 Oct 2025, Bodner et al., 2024).
Symbolic extraction: After training, dense splines may be substituted with analytical forms (polynomials, trigonometric functions) by fitting to a dictionary and thresholding via $q=0,\ldots,2n$ 6-score, yielding explicit structure-property equations in scientific domains (Fronzi et al., 3 Oct 2025).
Soft symbolification: S2KANs implement gate-based soft selection over a large dictionary of symbolic primitives, using a Minimum Description Length (MDL) regularization to balance interpretability and accuracy. This allows for graceful fallback to dense splines when symbolic forms are insufficient (Bagrow et al., 27 Nov 2025).
Basis-adaptive and projective strategies: P-KANs use entropy-driven “gravitational” regularization to automatically project edge functions into compact basis spaces (Fourier, Chebyshev) and penalize redundancy, producing compact, interpretable, and robust models (Poole et al., 24 Sep 2025).

6. Applications and Empirical Results

KANs see broad applicability in regression, system identification, scientific machine learning, and operator learning:

System identification: KANs deliver explicit state-space models in industrial control (e.g., buck converter), extracting the governing ODEs directly from data via sparse splines and symbolic regression (Gashi et al., 12 Jun 2025).
Physics and materials science: Applications include interpretable surrogates in thermoelectric materials, unveiling symbolic relations between descriptors and properties (Seebeck coefficient, band gap), with sparse network structure and closed-form domain-specific expressions (Fronzi et al., 3 Oct 2025).
Image and signal processing: Convolutional KANs replace convolutional kernels with spline-based nonlinear kernels, achieving higher parameter efficiency and matching standard CNN performance on datasets such as Fashion-MNIST, with reduced model size (Bodner et al., 2024).
Time series and zero-shot forecasting: Stacking KANs in a residual N-BEATS architecture enables cross-domain time series forecasting with strong generalization and compactness, directly applying the superposition principle for dynamical systems (Bhattacharya et al., 2024).

Empirical studies consistently show that KANs and variants attain performance competitive with or superior to MLPs, with fewer parameters, higher interpretability, and robustness to noise (Fronzi et al., 3 Oct 2025, Poole et al., 24 Sep 2025, Zhang et al., 9 Feb 2025, Bagrow et al., 27 Nov 2025).

7. Extensions, Limitations, and $q=0,\ldots,2n$ 7-adic Analogue

7.1. Extensions

Variants include:

Variational KANs: Infinite-basis adaptation at training time (Alesiani et al., 3 Jul 2025).
Projective, Fourier, and symbolic hybrids: Enhanced spectral and symbolic expressivity under strong regularization (Zhang et al., 9 Feb 2025, Poole et al., 24 Sep 2025, Bagrow et al., 27 Nov 2025).
Operator learning and PINNs: Integration into operator regression frameworks with residual-based attention and geometric complexity control (Toscano et al., 2024).

7.2. Limitations and Open Challenges

KANs are limited by computational load in extreme high-dimensional, noisy, or discontinuous settings; redundancy in the spline parameterization may result in a high-dimensional “nuisance space” that impedes generalization (Poole et al., 24 Sep 2025). Further, the pathological nature of classical univariate components constrains practical trainability—hence the need for spline smoothing and regularizing variants (Schmidt-Hieber, 2020, Liu et al., 2024).

7.3. $q=0,\ldots,2n$ 8-adic Analogue

In the non-Archimedean (ultrametric) $q=0,\ldots,2n$ 9-adic setting, the superposition representation simplifies: Any continuous $p=1,\ldots,n$ 0 (or $p=1,\ldots,n$ 1) can be written as $p=1,\ldots,n$ 2, with a single continuous outer $p=1,\ldots,n$ 3 and $p=1,\ldots,n$ 4 continuous inner maps. This simplification exploits the topological properties of $p=1,\ldots,n$ 5 and enables exceptionally compact representations (Zubarev, 11 Mar 2025).