Probabilistic Kolmogorov-Arnold Network (P-KAN)

Updated 26 October 2025

Probabilistic Kolmogorov–Arnold Network (P-KAN) is a regression framework that models full conditional output distributions by integrating KAN architecture with recursive ensemble techniques.
The DDR algorithm recursively partitions data to train specialized KAN models, enabling accurate capture of multi-modal outputs and input-dependent uncertainty.
P-KAN offers superior computational efficiency and robust uncertainty quantification, making it ideal for scientific, engineering, and industrial applications.

The Probabilistic Kolmogorov–Arnold Network (P‑KAN) is a regression modeling framework that extends the Kolmogorov–Arnold Network (KAN) to provide input-dependent probability distributions for systems exhibiting aleatoric uncertainty, where output variability arises inherently from stochastic processes. Unlike conventional regression models that yield point estimates or confidence intervals, P‑KAN predicts the full conditional output distribution, capturing features such as multi-modality and adaptive distribution type as a function of input. The architecture combines the computational efficiency and expressiveness of KANs with a recursive ensemble method called divisive data re-sorting (DDR), resulting in a technique suitable for scientific, engineering, and industrial contexts demanding robust uncertainty quantification.

1. Theoretical Foundations and Model Architecture

P‑KAN draws upon the Kolmogorov–Arnold representation theorem, which guarantees that any continuous multivariate function can be expressed as a superposition of continuous univariate functions. The canonical form used in KAN is:

$\hat{y}_i = \sum_{k=1}^{n} \Phi^{k}\left( \sum_{j=1}^{m} f^{(kj)}\left(X_j^i\right) \right)$

Here, $f^{(kj)}$ are learned univariate transformations of input components, and $\Phi^k$ aggregate these contributions, yielding the output. This highly structured decomposition results in efficient and descriptive models.

P‑KAN generalizes this framework from deterministic regression to probabilistic output modeling. Instead of only producing the expected value of $y$ for a given $X$ , the P‑KAN ensemble approximates the entire conditional distribution $P(y|X)$ , including higher-order statistical features. This is accomplished via the construction of an ensemble of KAN-based expectation models, each trained over strategically partitioned subsets of the data to collectively reproduce the empirical output distribution.

2. Divisive Data Re-Sorting (DDR) Ensemble Methodology

The DDR algorithm is pivotal in elevating KAN regression to a probabilistic modeling paradigm. The procedure is as follows:

Train a primary expectation KAN on the entire dataset.
Compute the residuals: $r_i = y_i - M(X_i)$ , where $M$ denotes the trained model.
Partition the data at the median residual, forming two clusters (above and below median).
Train a new KAN model on each cluster.
Repeat recursively: For each subsequent cluster, repeat residual computation, sorting, and partitioning.
Ensemble formation: Once a desired granularity is reached, collect all ensemble outputs for a new input $X$ . The collection of outputs forms a sample from the estimated $P(y|X)$ .

Through recursive subdivision, the ensemble adapts to regions of output space exhibiting high uncertainty or multiple modes. The DDR process yields sharper reductions in average error with each split, empirically demonstrated in the source material. Additionally, hard clustering can be augmented with a sliding window strategy, expanding sample diversity and enabling construction of an empirical cumulative distribution function (ECDF) that is sensitive to features such as multi-modality (e.g., faithfully representing bi-modal output distributions observed in stochastic dice-roll experiments).

3. Computational Efficiency and Implementation Considerations

KANs are designed for speed and efficiency, owing to their univariate functional decomposition. Each expectation model in the DDR ensemble operates on a smaller domain with reduced data, resulting in quick training cycles. The reported training time for constructing a shallow probabilistic KAN (~0.25 seconds, CPU-based) highlights the suitability for large-scale or rapid prototyping scenarios.

Ensemble construction scales logarithmically with the number of splits, and the workflow can be parallelized, further enhancing throughput. The implementation is straightforward, and public source code is available for both simple experiments (multilinear dice example: https://github.com/andrewpolar/vdice_bilinear) and full probabilistic KAN models (https://github.com/andrewpolar/pkan), supporting reproducibility and further development.

4. Modeling Aleatoric Uncertainty and Input-Dependent Output Distributions

P‑KAN is constructed specifically to address aleatoric uncertainty, which arises from inherent randomness in the data-generating process rather than limitations of measurement or model specification. Experimental regression datasets often exhibit input-dependent variability—some inputs yield broad, multi-modal outcomes, while others are tightly peaked.

By considering an ensemble of expectation models, P‑KAN provides fine-grained, input-dependent probability distributions over outputs, rather than summary statistics (mean, variance) alone. For instance, if the system under study exhibits bi-modal responses for a given input, the output distribution produced by the DDR-trained ensemble will accurately reflect the presence and scale of multiple peaks. The ECDF generated from ensemble model outputs allows for further investigation of higher moments, tail behavior, and the impact of input on the entire output distribution.

5. Comparative Performance and Advantages over Conventional Methods

Empirical comparisons indicate that P‑KAN outperforms conventional probabilistic regression models such as k-nearest neighbors (kNN) and Bayesian neural networks (BNNs), especially in scenarios characterized by small datasets or highly nontrivial uncertainty structures. Experiments conducted on synthetic and nonlinear functions (dice-roll and arctangent-based examples) demonstrate that DDR-trained KANs reproduce both mean and variance more accurately, and with lower normalized RMSE, than competing techniques. P‑KAN's direct estimation of full probability densities yields better goodness-of-fit and more reliable uncertainty quantification.

The computational simplicity of KAN expectation models (enabled by their functional form) means ensemble learning does not impose a substantial computational burden, a frequent limitation of BNNs or other deep ensemble approaches.

6. Applications and Future Directions

P‑KAN is applicable wherever rich uncertainty quantification is required, and where output distributions may be input-dependent and multi-modal. Sample domains include risk analytics, quality control, sensor-driven scientific processes, and natural phenomena that are inherently stochastic.

Open-source code bases facilitate rapid adoption and extension to new problems. The methodology is also well-positioned for integration with other advances in probabilistic modeling and operator learning, potentially serving as a base for more complex extensions such as Bayesian treatments of model parameters or further hierarchical ensemble strategies.

A plausible implication is that structured ensemble frameworks such as P‑KAN may form the basis for new families of uncertainty-aware neural models in scientific machine learning, particularly in domains where computational efficiency and probabilistic interpretability are both essential.

7. Source Code and Reproducibility

P‑KAN's public implementations (see https://github.com/andrewpolar/pkan) have made it possible for researchers to reproduce the experimental results and adapt the method to proprietary datasets and workflows. The source code is designed to allow researchers and practitioners to construct shallow or deep ensembles with minimal overhead, providing an accessible starting point for further theoretical or applied studies. Extensive documentation and usage examples are included to lower the barrier to entry for advanced probabilistic regression modeling.

In conclusion, Probabilistic Kolmogorov–Arnold Networks synthesize strong theoretical underpinnings with a practical, recursive ensemble modeling technique, enabling rapid, robust probabilistic regression that adapts to the complexity of aleatoric uncertainty and multi-modal output phenomena.

Markdown Report Issue Upgrade to Chat

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Probabilistic Kolmogorov-Arnold Network (P-KAN).