MultKAN: Advanced Kolmogorov–Arnold Networks
- MultKAN is an advanced architecture that builds on the Kolmogorov–Arnold Network paradigm by integrating explicit multiplicative operations to capture complex cross-terms effectively.
- It employs univariate nonlinear mappings and zero-parameter multiplication to enable transparent, sparse, and interpretable modeling across scientific, engineering, and multimodal applications.
- Variants such as LeanKAN, X-KAN, and Multi-Exit KAN demonstrate improved parameter efficiency, dynamic model selection, and robust equation recovery for both regression and physical law extraction tasks.
MultKAN refers to advanced architectures built upon the Kolmogorov–Arnold Network (KAN) paradigm, characterized by explicit multiplicative operations in addition to univariate additive decompositions. These models directly leverage the Kolmogorov–Arnold representation theorem to represent multivariate functions as compositions and sums of univariate nonlinear mappings, but MultKAN extends standard KANs by including multiplication nodes or analogous local model partitionings. This enables transparent, sparse, and interpretable modeling of complex relationships in scientific, engineering, and multimodal learning domains.
1. Theoretical Foundations: Kolmogorov–Arnold Networks and Multiplicative Extensions
Classical KANs instantiate the representation
where and are learnable univariate functions, typically realized as spline-parameterized activations. MultKAN generalizes this structure by introducing explicit multiplicative nodes or pairing subnodes for product operations. In the canonical MultKAN layer (Liu et al., 2024), subnodes are split into addition and multiplication groups, and a multiplication layer produces outputs such as
where selected subnode pairs (or tuples) are multiplicatively combined. This mechanism captures cross-terms and separable structure that would otherwise be obscured or require deeper stacking of additive-only units. The zero-parameter nature of the multiplication operation allows efficient representation without inflating parameter counts within the layer.
2. Architectural Variants and Implementation
There are several key instantiations of MultKAN models:
- Multiplicative Subnodes (MultKAN, KAN 2.0): Addition and multiplication node widths ( and ) specify, per layer, how many outputs are additive or multiplicative. Each multiplication node takes paired (or higher-order grouped) subnode outputs and performs element-wise multiplication, with all nonlinear transformation relegated to learned and activations (Liu et al., 2024).
- Local Kolmogorov–Arnold Networks (X-KAN): Instead of a single global KAN, the input space is partitioned into hyperrectangular regions ("rules"), each with its own local KAN model ("MultKAN" in an ensemble sense) trained via evolutionary rule-based learning (XCSF). This piecewise approach excels in modeling local curvature or discontinuities, with competitive accuracy and compactness in rule count (Shiraishi et al., 20 May 2025).
- Multi-Exit KANs: Layers are augmented with auxiliary prediction branches ("exits"), forming a deep-supervised architecture in which each internal layer can make a prediction, enabling early stopping and dynamic parsimony. The multi-exit framework empirically accelerates optimization, achieves higher accuracy, and often selects simpler sub-models that suffice for a given task (Bagrow et al., 3 Jun 2025).
- Lean MultKAN (LeanKAN): Designed to overcome parameter bloat and hyperparameter complexity in standard MultKAN, LeanKAN merges addition and multiplication mechanisms directly within each output node, allowing both operations on subsets of the activation matrix and reducing parameter count. This layer can be directly substituted for MultKAN or AddKAN layers, improving convergence behavior and expressivity (Koenig et al., 25 Feb 2025).
Implementation Principles
- Spline-based activations: KAN and MultKAN architectures universally employ low-order B-spline bases, typically with additional regularization (e.g., , entropy penalties).
- Zero-parameter multiplication: The multiplicative combination is non-learned; only the univariate transformations are parameterized, facilitating interpretability and efficient symbolic regression post-hoc.
- API support (pykan): The pykan package enables construction of MultKANs via width and mult_width vectors, with support for pruning and symbolic extraction (e.g.,
model.suggest_symbolic()), allowing direct formula discovery in scientific tasks (Liu et al., 2024).
3. Scientific Discovery: Interpretability, Sparse Modeling, and Equation Extraction
MultKAN architectures excel in interpretable scientific modeling. By construction, all nonlinearity is confined to plot-able univariate mappings, and explicit multiplication nodes directly reveal cross-terms, nonlinear modulations, and modular structure. In physics-informed learning scenarios:
- Equation Discovery (KAN-PISF): MultKAN, combined with sequentially regularized derivatives for denoising (SRDD) and physics-informed spline fitting (PISF), efficiently extracts candidate elementary nonlinear functions (including cross-products) forming a sparse overcomplete library. PISF then prunes this library to recover governing ODE/PDE models with minimal terms, as demonstrated for forced Duffing, Van der Pol, Burgers’, and Bouc–Wen systems (Pal et al., 2024).
- Physical Law Recovery: MultKAN’s graph structure yields immediate access to constituent terms such as , , , and more, facilitating symbolic formula extraction. Learned spline shapes correspond to interpretable scientific features (e.g., modulus, cubic nonlinearity), overcoming the opacity of deep MLPs or PINNs in model extraction (Liu et al., 2024, Pal et al., 2024).
- Noise robustness: With denoised derivative estimation and sparse activation pruning, model outputs remain stable under substantial measurement noise (e.g., 10%), outperforming black-box fits (Pal et al., 2024).
4. Multimodal Fusion and Balanced Learning
MultKAN principles extend naturally to multimodal learning, as in the KAN-MCP (MultKAN-MCP) framework for sentiment analysis (Luo et al., 16 Apr 2025):
- KAN Backbone: MultKAN serves as an ante-hoc interpretable fusion module, enabling explicit visualization and analysis of contributions from each modality via the learned univariate transformations.
- Dimensionality Reduction and Denoising Information Bottleneck (DRD-MIB): Each modality is purified prior to fusion, balancing information retention against compression and denoising through a variational information bottleneck. Gaussian posterior sampling and KL divergence forces compact, discriminative unimodal codes.
- Pareto-optimal Gradient Coordination (MCPareto): The fusion process dynamically balances gradient contributions across multimodal and unimodal losses, solving a conflict-avoidance quadratic program to select a convex combination aligned with the nearest Pareto-optimal direction.
- Empirical Performance: KAN-MCP attains state-of-the-art metrics on CMU-MOSI, CMU-MOSEI, and CH-SIMSv2, with notable robustness to modality imbalance and noise (Luo et al., 16 Apr 2025).
5. Accuracy, Model Complexity, and Practical Considerations
MultKAN architectures demonstrate competitive or superior accuracy compared to both vanilla KANs and conventional neural networks:
- Reduced Parameter Complexity: While canonical MultKAN can inflate parameter counts due to separate addition and multiplication widths and extraneous activations, LeanKAN resolves this with a single expressive hyperparameter and direct summation/product mechanisms, reducing parameter counts by 25–33% and accelerating convergence (Koenig et al., 25 Feb 2025).
- Model Parsimony: The multi-exit approach systematically selects the simplest sub-model that maintains optimal accuracy, as validated empirically across synthetic regression, dynamical systems, and real-world UCI datasets. In most cases, early exits yield the best validation performance and facilitate extraction of concise, low-order spline representations (Bagrow et al., 3 Jun 2025).
- Piecewise Modeling: X-KAN (MultKAN via XCSF) enables fine-grained local modeling in problems with non-stationary or discontinuous target functions, outperforming global MLPs or KANs (MAE reductions, rule compactness: ~7.2 rules on average) (Shiraishi et al., 20 May 2025).
- Symbolic Formula Discovery: All learnable functions in MultKAN remain interpretable and recoverable as explicit formulas (see pykan’s symbolic extraction), essential for scientific insight, feature attribution, and explainable AI.
6. Limitations, Extensions, and Recent Advances
Several drawbacks of canonical MultKAN have been identified and addressed:
- Parameter inflation and dummy activations: MultKAN’s architecture can require extraneous subnodes and complex hyperparameter sweeps, limiting efficiency and obscuring interpretability in multidimensional output layers. LeanKAN circumvents these issues while retaining full expressive power (Koenig et al., 25 Feb 2025).
- Fixed arity of multiplication: Current implementations restrict to pairwise multiplicative grouping; adaptive arity or higher-order tensor product mechanisms are suggested directions for future research (Liu et al., 2024).
- Global versus local modeling: Single global KANs may fail for discontinuous or highly localized functions. Piecewise MultKAN via XCSF demonstrably solves this with competitive local regression performance and robust fitness-based rule evolution (Shiraishi et al., 20 May 2025).
- Software and implementation: MultKANs are supported in pykan and related toolkits, with interfaces for graph visualization, symbolic conversion, and modularity enforcement, facilitating wide scientific deployment.
7. Empirical Benchmarks and Representative Results
Key experimental findings span regression, scientific equation discovery, multimodal fusion, and local modeling:
| Model | Domain | Accuracy | Parameter Count | Interpretability |
|---|---|---|---|---|
| MultKAN (KAN 2.0) | multip. (toy) | MSE | 1 Mult node | Exact mult |
| MultKAN (KAN-PISF) | Forced Duffing oscillator | ODE recovery | 2-layer, sparse | splines match terms |
| LeanKAN | Lotka–Volterra ODE | Test MSE | 240 | Sparser graphs |
| Multi-exit KAN | UCI Airfoil, Power Plant | RMSE $5.13/4.41$ | overhead | Parsimonious exits |
| X-KAN (MultKAN) | Sine-in-Sine discontinuity | MAE $0.1273$ | 7 rules | Local rule sets |
| MultKAN-MCP | CMU-MOSI (sentiment) | Acc2 $89.4$ | Pipeline (KAN + IB + Pareto) | Ante-hoc fusions |
These results underscore the versatility of MultKAN architectures in delivering transparent, sparse, and high-accuracy models across diverse domains, while enabling direct scientific interpretation and robust multimodal learning.