ReLU-KAN Framework: Efficient GPU KANs

Updated 5 January 2026

ReLU-KAN framework is a GPU-efficient, matrix-based network design that replaces classical basis expansions with ReLU activations while preserving universal approximation.
It employs a novel bell-shaped ReLU function for basis expansion, offering simplified computation, improved convergence, and a 5–20× speedup over traditional KANs.
This framework enhances interpretability and efficiency in applications like physics-informed neural networks and quantitative finance, despite challenges with smoothness for higher-order derivatives.

The ReLU-KAN framework refers to a Kolmogorov–Arnold Network (KAN) architecture in which the classical basis-function expansions are replaced by highly efficient combinations of the rectified linear unit (ReLU) activation. This approach enables a fully matrix- and pointwise-based network design conducive to accelerated GPU computation, while maintaining the interpretability and universal approximation properties of KANs. The ReLU-KAN structure has become a central point of comparison and development in recent research, underpinning new activation-function generalizations, theory of expressivity, and efficient applications in areas such as physics-informed neural networks (PINNs) and factors models in quantitative finance.

1. Theoretical Foundation: Kolmogorov–Arnold Representation

The foundational principle of the ReLU-KAN framework is the Kolmogorov–Arnold superposition theorem, which states that any continuous multivariate function $f(x_1, \ldots, x_n): [0,1]^n \rightarrow \mathbb{R}$ can be expressed as

$f(x_1, ..., x_n) = \sum_{q=1}^{2n+1}\Phi_q\left(\sum_{p=1}^n \phi_{q,p}(x_p)\right)$

where each $\Phi_q: \mathbb{R} \rightarrow \mathbb{R}$ and $\phi_{q,p}: [0,1] \rightarrow \mathbb{R}$ are continuous one-dimensional functions. KANs operationalize this decomposition by instantiating $\phi_{q,p}$ as learnable univariate function modules (typically expanded in a basis) and $\Phi_q$ as shallow mixing neurons.

Classical KANs used B-spline bases to implement $\phi_{q,p}$ . This provided flexibility and smoothness but entailed high computational complexity and poor GPU parallelizability. The ReLU-KAN variant departed from this by substituting the B-spline basis with simple, differentiable functions composed of ReLU primitives, thus drastically simplifying forward and backward passes, and making large-scale deployments practical (Qiu et al., 2024, Ta et al., 8 Mar 2025).

2. ReLU–KAN: Basis Construction and Mathematical Formulation

The essential innovation of ReLU-KAN is the use of a bell-shaped, compact-support basis function implemented purely via ReLU, elementwise product, and squaring: $R_i(x) = \left[\mathrm{ReLU}(x - l_i) \cdot \mathrm{ReLU}(h_i - x)\right]^2 \, \frac{16}{(h_i - l_i)^4}$ where $l_i$ and $h_i$ denote the left and right endpoints of the $i$ th basis’ support. For a “grid size” $G$ and “spline order” $k$ , $l_i = (-k + i - 1)/G$ , $h_i = i/G$ , with $i = 1, ..., n = G + k$ . This creates $n$ quadratic “bell” functions, each peaking at the center of their support and normalized to unit height.

A one-dimensional target is now approximated as: $f(x) \approx \sum_{i=1}^{n} w_i\, R_i(x) + b$ where $w_i$ and $b$ are trainable mixing parameters. For higher-dimensional or multilayer variants, for each input channel and output channel, a full set of basis expansions is computed and mixed by trainable weights, yielding an architecture with $(G+k)$ -fold parameter proliferation relative to standard MLPs (Ta et al., 8 Mar 2025, Qiu et al., 2024, So et al., 2024).

3. Implementation Architecture, Training, and Efficiency

A key motivation for ReLU-KAN is GPU efficiency: all steps reduce to matrix operations, fused pointwise kernels, and $1 \times 1$ convolutions. For each layer, the computation can be summarized as:

Compute $n$ bell-basis activations per input coordinate using only ReLU, elementwise product, squaring, and normalization.
Concatenate basis expansions across all input dimensions into a feature vector.
Apply a matrix multiplication or convolution to mix features into the output dimension (Qiu et al., 2024, So et al., 2024).

This eliminates recursion and conditional branching entirely, allowing streamlined CUDA implementations. Empirical studies report 5–20 $\times$ speedup over classical spline-based KANs on GPUs, with improved convergence and smaller mean-squared-error (MSE) for equivalent parameter counts (Qiu et al., 2024, So et al., 2024).

Training objectives are typically standard regression losses (e.g. MSE), optionally augmented by regularization such as $\ell_2$ weight penalties. Notably, the compact support of the $R_i(x)$ basis imparts “catastrophic forgetting avoidance”: parameters for disjoint basis functions remain unchanged when fitting local changes, paralleling KANs’ original empirical robustness to sequential learning (Qiu et al., 2024).

4. Expressivity, Theoretical Equivalence to ReLU Networks, and Interpretability

Both ReLU-KANs and standard ReLU feed-forward networks are universal approximators, but their architectural decompositions and parameterizations differ substantially (Schoots et al., 3 Mar 2025). Piecewise-linear KANs can be precisely converted to ReLU networks: any KAN of depth $L$ , width $n$ , and at most $k$ linear segments per univariate activation can be represented exactly as a ReLU network of depth $L+1$ , width $n^2(k+1)$ . Conversely, any standard ReLU network can be translated to a piecewise-linear KAN of the same depth and width, with univariate activations that have no more than two linear segments (i.e., “off” and “on” linear regions) (Schoots et al., 3 Mar 2025).

KANs and ReLU-KANs provide a more readily interpretable mapping from input to output: the function of each input dimension can be visualized directly via its univariate activation expansion, while deep MLPs entangle all features through nested matrix multiplications and pointwise nonlinearities. For parameter budgets, KANs may create more polyhedral regions per parameter than standard ReLU networks, especially for high multi-dimensional settings, suggesting greater expressivity (Schoots et al., 3 Mar 2025, Qiu et al., 2024).

5. Empirical Performance and Limitations

Quantitative studies on synthetic regression, physics-informed PDE solving, and conditional asset pricing tasks reveal that ReLU-KAN achieves dramatically accelerated training relative to classical KAN and typically attains lower regression MSE for matched width and depth (Qiu et al., 2024, Wang et al., 2024, So et al., 2024). On PDE benchmarks (Poisson and Burgers), ReLU-KAN achieves $5\times$ lower training times, smooth convergence curves, and competitive or superior stability compared to KANs with B-spline bases (So et al., 2024). In asset pricing, KAN-based autoencoders (which employ ReLU-KAN style basis expansions) outperform ReLU-MLPs in both out-of-sample predictive $R^2$ and Sharpe ratios, while offering substantial interpretability advantages through univariate transfer functions (Wang et al., 2024).

However, the ReLU-KAN basis exhibits limited smoothness: the squared ReLU bell function lacks high-order derivative continuity, which can impair precision for physics-informed neural networks (PINNs) solving PDEs that involve higher-order derivatives (So et al., 2024). Due to the compact support and zeroing of the basis for $x < l_i$ or $x > h_i$ , negative-value inputs cannot be captured, restricting the feature extraction capacity. The superficial “tiling” of basis functions to multi-dimensional inputs further increases parameter counts and breaks the native Kolmogorov–Arnold superposition structure, making true multi-input generalization non-optimal (Ta et al., 8 Mar 2025).

6. Extensions and Generalizations

Work targeting smoothness and robust multi-input modeling has produced several generalizations:

Higher-order ReLU-KAN (HRKAN): Utilizes higher even powers of the ReLU bell product, restoring smooth $m$ th-order derivatives essential for PDE solvers, and yielding $50$– $100\times$ improvements in test MSE at moderate computational cost (So et al., 2024).
AF-KAN: Generalizes the activation basis in KANs to arbitrary function families (beyond ReLU and B-spline), introducing attention mechanisms and normalization to manage parameter counts and enhance performance on complex tasks such as image classification. AF-KAN demonstrates that with diverse activation choices, KANs can outperform MLPs and vanilla ReLU-KAN variants for a fixed number of parameters, at the expense of increased training time and computation (Ta et al., 8 Mar 2025).

7. Applications and Future Research Directions

The ReLU-KAN framework and its variants have seen immediate application in scientific machine learning, notably for PINNs and interpretable modeling in quantitative finance, where the disentangled structure of the learned functions aids human interpretability (Wang et al., 2024). Ongoing research targets parameter-efficient deep architectures, integration of smooth non-polynomial activations, and hybridization with attention and normalization for improved feature extraction and generalization (Ta et al., 8 Mar 2025). Further comparative analysis of region partitioning power and generalization in high-dimensional regimes remains a key agenda, particularly in relation to the curse of dimensionality (Schoots et al., 3 Mar 2025).

Key References:

AF-KAN: Activation Function-Based Kolmogorov-Arnold Networks for Efficient Representation Learning (Ta et al., 8 Mar 2025)
Higher-order-ReLU-KANs (HRKANs) for solving physics-informed neural networks (PINNs) more accurately, robustly and faster (So et al., 2024)
ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU (Qiu et al., 2024)
Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks (Schoots et al., 3 Mar 2025)
KAN based Autoencoders for Factor Models (Wang et al., 2024)