Parameter-Reduced Kolmogorov-Arnold Networks
- PRKANs are neural architectures that integrate parameter reduction with the universal approximation power of traditional KANs, enabling compact yet expressive models.
- They employ strategies like basis compression, adaptive activation, sparse connectivity, and low tensor-rank decomposition to address parameter explosion.
- PRKANs achieve competitive performance on benchmarks such as MNIST and broaden applications in computer vision, reinforcement learning, scientific modeling, and hardware acceleration.
Parameter-Reduced Kolmogorov-Arnold Networks (PRKANs) are a family of neural architectures designed to combine the expressiveness of Kolmogorov-Arnold Networks (KANs) with strict parameter efficiency. They address the challenge of parameter explosion associated with classic KANs by integrating reduction strategies at both the algorithmic and architectural levels. Emerging rapidly since 2024, PRKANs appear across application domains: as compact universal approximators, in convolutional and scientific models, for reinforcement learning, in hardware/photonic implementations, and as interpretable symbolic regressors.
1. Origins and Motivation
PRKANs are rooted in the Kolmogorov–Arnold representation theorem, which states that any continuous multivariate function can be decomposed into a sum of univariate functions applied over linear or nonlinear combinations of inputs. Standard KANs operationalize this vision by replacing fixed MLP activations and linear weight matrices with flexible, learnable 1D functions (typically B-splines or other basis types) defined on edges between nodes. However, KANs increase parameter count per layer substantially, posing practical impediments for both training and deployment.
Parameter reduction in the PRKAN paradigm responds to these practical bottlenecks: it seeks to preserve the theoretical strengths and empirical advantages of KANs—fast neural scaling laws, superior fitting with fewer layers, and human-interpretable internal structure—while achieving total parameter counts that are competitive with or lower than conventional MLPs or CNNs (Ta et al., 13 Jan 2025, Liu et al., 30 Apr 2024, Bodner et al., 19 Jun 2024).
2. Architectural Variants and Reduction Methodologies
A range of parameter reduction strategies has distinguished the development of PRKANs:
2.1 Basis Function Compression
Classic KANs associate a learnable function (often a spline or RBF expansion) to each edge. PRKANs compress or aggregate this representation in multiple ways:
- Summation and Weighting: Summing the outputs of basis function expansions along the basis dimension to reduce a tensor from to , followed by a compact linear projection (Ta et al., 13 Jan 2025).
- Learnable Weight Vectors: Multiplying the spline tensor by a learned vector to collapse the basis dimension.
- Attention-Based Feature Reduction: Applying a softmax attention branch across basis dimensions to dynamically select or emphasize the most influential basis functions, with resulting weighted sums forming the reduced features passed onwards (Ta et al., 13 Jan 2025).
- Convolutional and Pooling Operations: Permuting the spline/basis expansion tensors and applying pointwise 1D convolutions (optionally followed by pooling) as a means of local summarization before projection.
2.2 Adaptive Activation Selection
To further reduce redundancy, PRKANs can prune unneeded nonlinearity by introducing a selectable activation space at each node or edge (e.g., splines, RBF, Chebyshev, wavelet, fast continuous activations), weighting each candidate during training, and then dropping low-contribution activations through sparsity promotion and pruning (Yang et al., 15 Aug 2024). This reduces parameter footprint while retaining task-specific expressivity.
2.3 Sparse Connectivity and Architectural Optimization
PRKANs can exploit sparse connection patterns. Genetic algorithms, such as in GA-KAN, encode neuron-to-neuron connectivity sparsely in binary chromosomes, evolving networks with only essential active edges. Layers or even whole groups of connections without useful contributions are dropped via a “degradation mechanism” (zero masking) during decoding (Long et al., 29 Jan 2025). This results in highly compact, interpretable, and parameter-minimal KANs.
2.4 Low Tensor-Rank Decomposition
Tucker decomposition of the KAN parameter tensors allows adaptation or training of only a low-rank core and associated factors (as in Slim KANs), dramatically reducing the number of effective free parameters and sharing information efficiently across layers and tasks. This is most developed in transfer learning and PDE modeling (Gao et al., 10 Feb 2025).
2.5 Specialized Basis and Hybrid Activations
Substituting B-splines with Gaussian RBFs (GRBFs) (Ta et al., 13 Jan 2025), leveraging learnable Fourier features (Zhang et al., 9 Feb 2025), or implementing sinusoidal activations with fixed, linearly spaced phases (Gleyzer et al., 1 Aug 2025), PRKANs optimize the basis for both efficiency and task-likelihood. Hybrid activation mechanisms (e.g., a linear blend of GELU and RFF) adaptively balance smooth and oscillatory components, ensuring broad spectral coverage with few parameters (Zhang et al., 9 Feb 2025).
2.6 Efficient Layer Designs for Special Applications
LeanKAN layers internalize multiplication and addition within a lean, modular structure, avoiding the parameter and dummy-activation bloat found in prior MultKANs (Koenig et al., 25 Feb 2025). In photonic hardware, PRKANs exploit physical constraints to engineer edge-mapped nonlinearities via ring-assisted Mach–Zehnder Interferometer units, scaling more favorably in silicon area and energy (Peng et al., 15 Aug 2024).
3. Applications and Practical Performance
PRKANs are deployed across a broad array of machine learning problems and scientific computation scenarios. Select highlights include:
- Image Classification and Computer Vision: PRKANs with attention, convolutional reduction, or basis selection achieve near-identical accuracy to MLPs/CNNs, while using up to 50% fewer parameters on benchmarks such as MNIST and Fashion-MNIST (Ta et al., 13 Jan 2025, Bodner et al., 19 Jun 2024).
- Function Approximation and Scientific Discovery: Through compact basis expansion (e.g., LeanKAN, SineKAN), PRKANs recover analytic target functions or physical laws with superior convergence and parameter efficiency, especially when non-smooth or oscillatory structure is present (Koenig et al., 25 Feb 2025, Gleyzer et al., 1 Aug 2025).
- Physics-Informed Modeling and PDEs: Low tensor-rank adaptation (LoTRA) enables efficient transfer learning and function adaptation in new differential equation domains at a fraction of the parameter cost of full KAN models (Gao et al., 10 Feb 2025).
- Reinforcement Learning and Control: Integration of PRKANs into PPO frameworks demonstrates matching or superior policy/value function performance versus standard MLPs, while using an order of magnitude fewer parameters (Kich et al., 9 Aug 2024).
- Hardware and Photonic Implementations: B-spline function evaluation is instantiated through lookup-tables and time/dynamic voltage-coded, compute-in-memory arrays. Quantization-aware and symmetry-aligned reductions further minimize hardware overhead without appreciable accuracy losses in real-world silicon prototypes (Huang et al., 7 Sep 2025, Peng et al., 15 Aug 2024).
- Symbolic and Interpretable Modeling: Genetic algorithm-driven PRKANs extract concise, closed-form symbolic regressions for scientific classification tasks, enhancing transparency as compared to standard dense architectures (Long et al., 29 Jan 2025).
PRKAN Variant | Core Reduction Strategy | Sample Benchmark/Domain |
---|---|---|
Summation, Convolution, Attention | Tensor contraction along basis dim. or channel dim. | MNIST, Fashion-MNIST (Ta et al., 13 Jan 2025) |
Selectable Activation (S-KAN) | Adaptive, pruned activation pool per node | CIFAR-10, regression (Yang et al., 15 Aug 2024) |
Genetic (GA-KAN) | Sparse evolutionary connectivity | Iris, Wine (Long et al., 29 Jan 2025) |
LeanKAN | Multiplicative/additive operator merging | ODEs, function fitting (Koenig et al., 25 Feb 2025) |
Tucker LoTRA, Slim KAN | Low tensor-rank decomposition | PDEs, MNIST (Gao et al., 10 Feb 2025) |
Sine/Fourier PRKAN | Sinusoidal or RFF basis | Function approximation (Zhang et al., 9 Feb 2025, Gleyzer et al., 1 Aug 2025) |
Photonic PRKAN | Edge-based physical nonlinearities | Silicon/MZI benchmarking (Peng et al., 15 Aug 2024) |
Hardware-Accelerated PRKAN | Quantization-/LUT-based, hardware-aware design | Large-scale recommenders (Huang et al., 7 Sep 2025) |
4. Empirical Insights and Performance Characteristics
PRKANs maintain high expressivity and, with careful architecture selection, can outperform or match baseline MLP and standard KAN models in both supervised and unsupervised tasks. Across multiple studies, key empirical trends include:
- Training and validation accuracies for PRKANs using attention and convolutional reduction on MNIST reach 99.8% and 97.5%, respectively, with total parameter counts matched to those of baseline MLPs (Ta et al., 13 Jan 2025).
- SineKAN and Kolmogorov-Arnold-Fourier variants exhibit superior performance to both fixed-frequency Fourier methods and vanilla ReLU-MLPs for complex oscillatory target functions at substantially lower or matched parameter counts (Zhang et al., 9 Feb 2025, Gleyzer et al., 1 Aug 2025).
- Slim KANs with Tucker-decomposed tensors retain accuracy competitive with full KANs and baseline MLPs for PDE-parameterized and classification benchmarks, while reducing storage and adaptation costs by an order of magnitude (Gao et al., 10 Feb 2025).
- Genetic algorithm-based architectures both minimize parameter count and enable direct extraction of symbolic mappings for scientific datasets (Long et al., 29 Jan 2025).
- Hardware-aware PRKANs, using alignment-symmetry LUT sharing and power-of-two quantizations, scale area by 28Kx to 41Kx and power by just 51x to 94x even as parameter counts increase by 500Kx to 807Kx, with negligible accuracy degradation (≤0.23%), supporting deployment at edge scale (Huang et al., 7 Sep 2025).
5. Theoretical and Mathematical Underpinnings
PRKANs preserve and, in some variants, extend the approximation claims guaranteed by the Kolmogorov-Arnold theorem:
- For spline-based PRKANs, the error bound decays as in the spline order and grid size (Liu et al., 30 Apr 2024).
- Low tensor-rank PRKANs rely on the expressivity of Tucker decompositions, allowing efficient, compressed adaptation in transfer scenarios (Gao et al., 10 Feb 2025).
- Sinusoidal PRKANs are justified via the sinusoidal approximation theorem, demonstrating that sums of weighted, fixed-phase sine functions can universally approximate continuous functions on compact domains (Gleyzer et al., 1 Aug 2025).
- In practice, careful selection of spline/grid resolution, basis type (e.g., GRBF, Chebyshev, Sine), tensor-rank, or activation mixtures tunes the tradeoff between expressivity, parameter count, and generalization, with theoretical and empirical evidence guiding design (Liu et al., 30 Apr 2024, Ta et al., 13 Jan 2025).
6. Training, Stability, and Implementation Considerations
Training PRKANs introduces new optimization challenges due to the variance in parameterization and non-standard activations:
- Training sensitivity (overfitting, learning rate instability) is accentuated versus standard MLPs, necessitating tailored initialization (often Kaiming-Normal), lower learning rates, and explicit regularization (Sohail, 8 Nov 2024).
- Selective fine-tuning of only a subset of basis parameters (e.g., higher order coefficients in polynomial basis functions) can reduce overfitting and adaptation cost when moving to new tasks (Drokin, 1 Jul 2024).
- Applicability of backpropagation-free approaches (such as the HSIC Bottleneck) may help mitigate gradient issues, though they may slightly underperform compared to standard methods in final accuracy (Sohail, 8 Nov 2024).
- In hardware deployments, aligning the quantization and knot grids, as well as mapping dominant activation coefficients to circuit regions with minimal process variation, improves both energy efficiency and inference stability (Huang et al., 7 Sep 2025).
7. Outlook and Future Directions
Current trends and open questions in PRKAN research include:
- Expansion into high-overhead domains (Transformers, large-scale scientific computing) leveraging the cost reductions evidenced in vision, language, and PDE problems (Zhang et al., 9 Feb 2025, Ta et al., 13 Jan 2025).
- Further theoretical work on generalization, especially for fixed-phase sinusoidal and low-rank architectures (Gleyzer et al., 1 Aug 2025, Gao et al., 10 Feb 2025).
- Development of selective activation and adaptive pruning strategies to balance automatic architecture adaptation with minimal manual tuning (Yang et al., 15 Aug 2024).
- Algorithm-hardware co-design for further edge and in-memory computing acceleration, with continued reductions in area, power, and inference latency (Huang et al., 7 Sep 2025).
- Wider adoption of symbolic extraction procedures for interpretable PRKANs in scientific discovery and decision-critical applications (Long et al., 29 Jan 2025, Liu et al., 30 Apr 2024).
PRKANs thus constitute an expanding family of neural function approximators that realize the vision of compact, interpretable, efficient deep architectures. By blending reduced parameterization, rich activation bases, and modular construction with empirical rigor, they present a robust platform for efficient AI, especially in scientific and embedded applications.