KAN-based Neural Density Estimator
- KAN-based Neural Density Estimator is a probabilistic model that leverages Kolmogorov-Arnold superposition to encode multivariate functions as compositions of univariate splines.
- It employs a Divisive Data Re-sorting ensemble to dynamically capture aleatoric uncertainty, multimodality, and various output distributions.
- The method offers computational efficiency and clear interpretability with fewer parameters, making it ideal for complex, data-rich applications.
The Kolmogorov-Arnold network (KAN)-based Neural Density Estimator is a class of probabilistic models leveraging the mathematical framework of Kolmogorov-Arnold superposition for efficiently modeling input-dependent output distributions with strong computational performance and interpretability. KANs encode arbitrary multivariate functions as compositions of univariate splines, and the density estimation framework combines this with an adaptive ensemble approach for capturing aleatoric uncertainty, multi-modality, and varying distribution types. This entry synthesizes foundational theory, model construction, uncertainty handling, efficiency properties, use cases, and practical deployment considerations.
1. Kolmogorov-Arnold Superposition and KAN Regression Architecture
KANs are built upon the Kolmogorov-Arnold representation theorem: any continuous multivariate function can be written as a finite sum of compositions of univariate functions and addition. The canonical KAN regression model is given by: where is the input, each is an "inner" univariate function (typically a B-spline parameterization), and each is an "outer" function. The value ensures theoretical completeness for continuous functions. Compared to MLPs, KANs eliminate fixed linear weights and replace them with learnable, adaptive functions on edges, effectively combining the linear transform and nonlinear activation into a single compositional block.
Such architecture enables efficient representation of complex functions with substantially fewer parameters than conventional networks. Each trainable parameter describes the shape of a spline rather than a scalar, facilitating local adaptivity and interpretability.
2. Divisive Data Re-sorting (DDR) Ensemble and Probabilistic Density Estimation
The core methodology to transition deterministic regression to density estimation uses the Divisive Data Re-sorting (DDR) algorithm. This process starts by fitting a base expectation model using least squares on the entire dataset: Residuals are computed, and the dataset is split at the median residual, forming two clusters. On each, a new expectation model is trained. Repeating this recursion doubles the number of clusters (and models) at each step (2, 4, 8, ...), yielding a progressively finer ensemble.
For small data sets, a "shallow probabilistic" mode uses KAN to compute intermediate variables
and recasts the model as a generalized additive model (GAM): DDR is then applied to , simplifying models and boosting training efficiency.
For any input, the collection of ensemble outputs forms the empirical distribution, capturing mean, variance, and possible multimodality.
3. Aleatoric Uncertainty and Input-Dependent Output Distributions
KAN-ensemble density estimators specifically address aleatoric uncertainty—random variation in output for repeated, identical inputs. The clustering induced by residual-based DDR allows the model to "learn" distinct behaviors, not only predicting average output but also the full probabilistic spread. Output distributions adapt to local characteristics: ensembles can capture multimodality, heteroskedasticity, and switch distributional types depending on the input.
For example, in the dice simulation, inputs corresponding to different dice numbers and probabilities produced bimodal distributions; DDR with KAN robustly captured these features and distribution variations.
4. Computational Efficiency and Scaling
The DDR-KAN methodology possesses notable efficiency advantages. Initial training occurs on the full dataset, but subsequent clustering partitions allow retraining on smaller, simpler subsets—reducing parameters and accelerating model fitting. In shallow mode, with, for example, a 32-model ensemble of KANs ($77$ parameters each), training only requires 0.25 seconds on a standard Intel i7 CPU.
DDR also avoids per-query re-sorting, unlike kNN-based estimators, enhancing prediction speed for deployment. The local nature of splines in KAN further ensures that adaptation to sharp density features does not globally inflate complexity.
5. Empirical Evaluation and Applications
Empirical studies demonstrate the framework’s ability to recover true distributions and their moments:
- Dice simulation: Accurate estimation of the sum distribution and its standard deviation (normalized RMSE ∼4% for mean, ∼14% for standard deviation). The DDR-KAN ensemble outperformed kNN and Bayesian NNs (BNNs) in multi-modality and goodness-of-fit (passing 85/100 tests).
- Synthetic nonlinear/noisy function: DDR-KAN closely matched Monte Carlo ECDFs, even in regions exhibiting multi-modality and input-dependent distributional changes.
- Clustering strategies: DDR ensemble consistently captured uncertainty better than randomly clustered ensembles.
These results validate the KAN-based estimator’s suitability for scientific, engineering, and simulation settings where input-dependent, non-Gaussian uncertainties are substantial.
6. Implementation Resources and Practical Deployment
Source codes for the methodology are openly available:
- Dice model: https://github.com/andrewpolar/vdice_bilinear
- Probabilistic KAN with DDR: https://github.com/andrewpolar/pkan
- BNN example: https://github.com/andrewpolar/vdice-python
Practitioners may use the full DDR-KAN ensemble or its shallow form depending on data size and complexity. The approach is broadly applicable to regression models seeking efficient, interpretable, and locally adaptive density estimation.
7. Context and Impact
The KAN-based neural density estimation framework introduces a novel route to converting deterministic regressors into probabilistic models without sacrificing computational performance. Its theoretical underpinnings, ensemble-driven uncertainty modeling, and empirical successes position it as an efficient and practical tool for input-dependent, multimodal, and locally varying output densities in data-rich scientific problems. By merging Kolmogorov-Arnold representations and a recursive, interpretable ensemble strategy, this methodology provides a scalable alternative to traditional black-box BNNs and kernel-based estimators, particularly where model size, interpretability, and uncertainty quantification are paramount (Polar et al., 2021).