Kolmogorov-Arnold Autoencoders

Updated 11 January 2026

KAN-AEs are a class of autoencoders that replace fixed nonlinearities with learnable univariate functions, leveraging the Kolmogorov-Arnold theorem for universal approximation.
They integrate basis functions—such as splines, polynomials, and Fourier series—to enhance interpretability and performance in domains like asset pricing, fault detection, and image representation.
Empirical results show that KAN-AEs achieve competitive reconstruction, denoising, and classification metrics while offering detailed insight through edge-wise function visualization.

Kolmogorov-Arnold Network-Based Autoencoders (KAN-AEs) are a class of autoencoders in which the usual fixed nonlinearities of neural networks are replaced with learnable univariate functions—typically constructed as splines, polynomials, or other explicit univariate bases—on every edge. This design is motivated by the Kolmogorov-Arnold superposition theorem, which states that any continuous multivariate function can be written as a finite sum of univariate functions applied to affine combinations of the inputs. By structuring autoencoders to mirror this decomposition, KAN-AEs achieve universal approximation power while offering increased flexibility, interpretability, and in many cases, improved or competitive empirical performance compared to conventional MLP- or CNN-based autoencoders. KAN-AEs have demonstrated their utility across domains including asset pricing, industrial fault detection, and representation learning for image data (Wang et al., 2024, Villagómez et al., 4 Aug 2025, Yu et al., 2024, Moradi et al., 2024).

1. Theoretical Foundations and Layer Structure

KAN-AEs are constructed on the basis of the Kolmogorov-Arnold theorem, which asserts that any continuous function $f:\mathbb{R}^n\to\mathbb{R}$ admits a decomposition:

$f(x_1,\dots,x_n) = \sum_{k=1}^{2n+1} \phi_k\left(\sum_{j=1}^n \psi_{kj}(x_j)\right),$

where each inner function $\psi_{kj}$ and each outer function $\phi_k$ is univariate and continuous. In KAN-AEs, each univariate function (along either layer or edge) is parameterized by a basis family—B-splines, polynomials, RBFs, Fourier series, or wavelets (Wang et al., 2024, Villagómez et al., 4 Aug 2025, Yu et al., 2024, Moradi et al., 2024).

A canonical KAN layer operates as follows. For layer input $x^{(\ell-1)} \in \mathbb{R}^{n_{\ell-1}}$ , the output $x^{(\ell)} \in \mathbb{R}^{n_\ell}$ is computed coordinate-wise via

$x^{(\ell)}_i = \sum_{j=1}^{n_{\ell-1}} \varphi_{ij}^{(\ell)}\left(x^{(\ell-1)}_j\right), \quad i = 1,\ldots,n_\ell,$

where $\varphi_{ij}^{(\ell)}$ is a univariate function parameterized independently for each $(i,j)$ pair. For spline-based implementations, $\varphi_{ij}$ is typically a linear combination of fixed-degree B-spline basis functions with learnable coefficients. In polynomial variants, each $\varphi_{ij}$ is a learnable low-degree polynomial (Moradi et al., 2024, Yu et al., 2024).

Stacking several KAN layers (with or without intermediate pointwise nonlinearities) allows building deep architectures capable of modeling highly complex relationships, mirroring the universality guarantee of the Kolmogorov-Arnold theorem.

2. Canonical Architectures and Domain-Specific Variations

The KAN-AE design adapts to diverse domains:

Factor Models in Asset Pricing: The KAN-AE comprises two primary networks: a "Beta network" mapping asset characteristics $Z_{i,t-1} \in \mathbb{R}^p$ to factor exposures (betas) via multi-layer KAN blocks, and a "Factor network" mapping returns $r_t \in \mathbb{R}^N$ linearly (optionally with dimensionality reduction) to factor returns (Wang et al., 2024). Inner and outer spline functions in the Beta network yield flexible, nonlinear exposure profiles interpretable at the univariate level.
Generic Representation Learning and Image Data: The encoder and decoder each employ one or several KAN layers (often as a first/last layer or alternating with dense layers) followed by a bottleneck latent space. Common settings include MNIST, CIFAR-10, and other benchmark datasets (Moradi et al., 2024, Yu et al., 2024). Activation functions remain on edges, with B-splines or polynomials as the most frequent basis.
Fault Detection: Variants based on different basis sets—EfficientKAN (B-splines), FastKAN (Gaussian RBFs), FourierKAN (Fourier series), WavKAN (wavelets)—are deployed for process monitoring due to their interpretability and data efficiency (Villagómez et al., 4 Aug 2025). Edge-wise basis choice strongly influences data requirements and detection rates.

In all cases, the standard autoencoder structure—reconstruction of high-dimensional input through compression into and re-expansion from a lower-dimensional latent space—is preserved.

3. Training Objectives and Optimization

The primary objective remains mean squared error (MSE) reconstruction loss:

$\mathcal{L}_{\mathrm{MSE}} = \frac{1}{N} \sum_{i=1}^N \|x_i - \hat{x}_i\|_2^2,$

where $x_i$ is the input and $\hat{x}_i$ is the autoencoder reconstruction.

Regularization strategies include:

$\ell_2$ -norm penalties on learnable parameters (ridge penalty on linear reductions, weight decay on splines or polynomials) (Wang et al., 2024, Yu et al., 2024).
$\ell_1$ and entropy penalties on spline coefficients to encourage sparsity (notably in EfficientKAN) (Villagómez et al., 4 Aug 2025).
Orthogonality constraints for comparison baselines (not used in KAN-AEs themselves) (Villagómez et al., 4 Aug 2025).

Optimization is typically via Adam or AdamW, with learning rates in $\{10^{-3}, 10^{-4}, 10^{-5}\}$ , batch sizes of several hundred, and early stopping or learning-rate annealing (Moradi et al., 2024, Yu et al., 2024).

KAN-AEs, due to their edge-based parameterization, may require more parameters than conventional MLP/CNN encoders, especially for high-dimensional data. The polynomial variant of KAN (editor's shorthand: "KAE") is substantially more parameter-efficient than B-splines or Fourier-based KANs for moderate polynomial degrees while retaining strong performance (Yu et al., 2024).

4. Empirical Performance and Domain Results

Asset Pricing (Factor Models): KAN-AEs achieve out-of-sample $R^2$ up to 0.214% (6-factor model), a total $R^2$ of up to 11.32%, and long–short portfolio Sharpe ratios of 0.96, exceeding those of MLP autoencoders or classical conditional asset pricing models. KAN-AEs consistently deliver smoother validation losses and more stable learning curves (Wang et al., 2024).

Fault Detection in Process Data: On the Tennessee Eastman benchmark, KAN-AEs attain Fault Detection Rates (FDR) of 90%–96% depending on the variant and training set size. EfficientKAN-AE reaches ≥90% FDR with only 500 training samples; WavKAN-AE achieves 92% FDR at 4,000 samples and remains the top performer at larger scales. FastKAN-AE becomes competitive at much larger training sizes, while FourierKAN-AE consistently underperforms (plateauing at ~80% FDR) (Villagómez et al., 4 Aug 2025).

Representation Learning: On MNIST, KAE (degree 2 polynomial) reduces reconstruction error by 54% compared to a standard autoencoder and boosts recall@10 in similarity search by +0.242. On classification, KAE (degree 3) achieves accuracy of 0.940 (+8.7% vs standard AE). For denoising, KAE(p=2) outperforms classic architectures by a margin of 42% lower MSE (Yu et al., 2024). In direct comparison with convolutional autoencoders, KAN-AEs halve the reconstruction loss on SVHN and CIFAR-10 (albeit with higher parameter counts) (Moradi et al., 2024).

5. Interpretability and Model Transparency

KAN-AEs provide explicit and structured interpretability:

Univariate Functions Visualization: Every edge’s activation (spline or polynomial $f_{k,i}(x)$ ) can be individually plotted—clarifying how each input coordinate affects each downstream unit (Wang et al., 2024, Yu et al., 2024, Moradi et al., 2024, Villagómez et al., 4 Aug 2025).
Factor Models: In finance, learned splines $\psi_{kj}(\cdot)$ and $\phi_k(\cdot)$ can be inspected to test economic hypotheses (e.g., monotonicity of exposure to a risk factor as a function of technical variables) (Wang et al., 2024).
Fault Detection: Localized bases such as B-splines or wavelets reveal which input ranges trigger fault signatures, providing actionable diagnostics in industrial control (Villagómez et al., 4 Aug 2025).
Edge-specificity: Unlike MLPs or CNNs, where node activations are shared, edge-based learnable nonlinearities provide fine control and transparency at every connection.

The Kolmogorov-Arnold representation ensures that, in principle, the global network decision surface can be traced directly to the learned univariate building blocks—yielding a more interpretable alternative to conventional neural parameterizations.

6. Comparison with Conventional Architectures

Relative to MLP-based autoencoders, KAN-AEs:

Better Approximate Smooth Nonlinearities: Without piecewise-linear bias inherent to ReLU networks, KANs yield finer and more data-adapted approximations (Wang et al., 2024, Yu et al., 2024, Moradi et al., 2024).
Parameter Efficiency (in polynomial variants): Low-degree polynomial edge functions allow a drastic reduction in parameter count compared to spline- or Fourier-based KANs for comparable expressivity (Yu et al., 2024).
Empirical Superiority: KAN-AEs match or exceed MLP/CNN AEs in both reconstruction metrics and downstream tasks, particularly evident in denoising and discriminative performance (Yu et al., 2024, Moradi et al., 2024).
Visualization/Interpretation Capability: Modular univariate functions afford detailed inspection, a property not feasible with black-box node activations.

However, spline-based KANs incur higher computational and memory costs, especially in high-dimensional data, and require architectural pruning or basis selection for efficiency on large-scale tasks (Moradi et al., 2024). FourierKAN variants underperform on practical fault detection tasks compared to B-spline and wavelet forms (Villagómez et al., 4 Aug 2025).

7. Practical Deployment, Limitations, and Directions

KAN-AEs have proven practical for asset pricing (Wang et al., 2024), data-constrained industrial settings (Villagómez et al., 4 Aug 2025), and diverse representation learning tasks (Yu et al., 2024, Moradi et al., 2024). Their major strengths are interpretability, data efficiency (particularly for EfficientKAN and WavKAN), and adaptability via the choice of univariate basis.

Limitations include:

Parameter Growth for Spline/Edge Parameterizations: Particularly significant for dense, high-input networks (Moradi et al., 2024).
Scaling to Deep/Convolutional Architectures: Open research includes integrating KAN-AE blocks into deeper CNNs, exploring variational (VAE) and adversarial (AAE) setups, and generalizing to richer univariate function classes beyond polynomials and splines (Yu et al., 2024).
Basis Function Selection: Optimal univariate basis may depend strongly on data regime, domain structure, and interpretability/computation tradeoffs (Villagómez et al., 4 Aug 2025).

A plausible implication is that further advances in efficient parameterization, hybrid edge/node nonlinearity architectures, and scalable optimization of univariate function dictionaries will expand the deployment of KAN-AEs in complex, high-dimensional learning environments.