Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 103 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 27 tok/s
GPT-5 High 37 tok/s Pro
GPT-4o 92 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 241 tok/s Pro
2000 character limit reached

KAN-based Neural Density Estimator

Updated 27 August 2025
  • KAN-based Neural Density Estimator is a probabilistic model that leverages Kolmogorov-Arnold superposition to encode multivariate functions as compositions of univariate splines.
  • It employs a Divisive Data Re-sorting ensemble to dynamically capture aleatoric uncertainty, multimodality, and various output distributions.
  • The method offers computational efficiency and clear interpretability with fewer parameters, making it ideal for complex, data-rich applications.

The Kolmogorov-Arnold network (KAN)-based Neural Density Estimator is a class of probabilistic models leveraging the mathematical framework of Kolmogorov-Arnold superposition for efficiently modeling input-dependent output distributions with strong computational performance and interpretability. KANs encode arbitrary multivariate functions as compositions of univariate splines, and the density estimation framework combines this with an adaptive ensemble approach for capturing aleatoric uncertainty, multi-modality, and varying distribution types. This entry synthesizes foundational theory, model construction, uncertainty handling, efficiency properties, use cases, and practical deployment considerations.

1. Kolmogorov-Arnold Superposition and KAN Regression Architecture

KANs are built upon the Kolmogorov-Arnold representation theorem: any continuous multivariate function can be written as a finite sum of compositions of univariate functions and addition. The canonical KAN regression model is given by: y^i=k=1nΦk(j=1mfkj(Xij))\hat{y}_i = \sum_{k=1}^n \Phi^k \left(\sum_{j=1}^m f^{kj}(X_{ij})\right) where XiRmX_i \in \mathbb{R}^m is the input, each fkjf^{kj} is an "inner" univariate function (typically a B-spline parameterization), and each Φk\Phi^k is an "outer" function. The value n=2m+1n = 2m+1 ensures theoretical completeness for continuous functions. Compared to MLPs, KANs eliminate fixed linear weights and replace them with learnable, adaptive functions on edges, effectively combining the linear transform and nonlinear activation into a single compositional block.

Such architecture enables efficient representation of complex functions with substantially fewer parameters than conventional networks. Each trainable parameter describes the shape of a spline rather than a scalar, facilitating local adaptivity and interpretability.

2. Divisive Data Re-sorting (DDR) Ensemble and Probabilistic Density Estimation

The core methodology to transition deterministic regression to density estimation uses the Divisive Data Re-sorting (DDR) algorithm. This process starts by fitting a base expectation model M0M_0 using least squares on the entire dataset: M0=argminMAi(yiM(Xi))2M_0 = \arg\min_{M \in \mathcal{A}} \sum_i (y_i - M(X_i))^2 Residuals ri=yiM0(Xi)r_i = y_i - M_0(X_i) are computed, and the dataset is split at the median residual, forming two clusters. On each, a new expectation model is trained. Repeating this recursion doubles the number of clusters (and models) at each step (2, 4, 8, ...), yielding a progressively finer ensemble.

For small data sets, a "shallow probabilistic" mode uses KAN to compute intermediate variables

θk,i=j=1mfkj(Xij)\theta_{k,i} = \sum_{j=1}^m f^{kj}(X_{ij})

and recasts the model as a generalized additive model (GAM): y^i=k=1nΦk(θk,i)\hat{y}_i = \sum_{k=1}^n \Phi^k(\theta_{k,i}) DDR is then applied to {θi,yi}\{\theta_i, y_i\}, simplifying models and boosting training efficiency.

For any input, the collection of ensemble outputs forms the empirical distribution, capturing mean, variance, and possible multimodality.

3. Aleatoric Uncertainty and Input-Dependent Output Distributions

KAN-ensemble density estimators specifically address aleatoric uncertainty—random variation in output for repeated, identical inputs. The clustering induced by residual-based DDR allows the model to "learn" distinct behaviors, not only predicting average output but also the full probabilistic spread. Output distributions adapt to local characteristics: ensembles can capture multimodality, heteroskedasticity, and switch distributional types depending on the input.

For example, in the dice simulation, inputs corresponding to different dice numbers and probabilities produced bimodal distributions; DDR with KAN robustly captured these features and distribution variations.

4. Computational Efficiency and Scaling

The DDR-KAN methodology possesses notable efficiency advantages. Initial training occurs on the full dataset, but subsequent clustering partitions allow retraining on smaller, simpler subsets—reducing parameters and accelerating model fitting. In shallow mode, with, for example, a 32-model ensemble of KANs ($77$ parameters each), training only requires \sim0.25 seconds on a standard Intel i7 CPU.

DDR also avoids per-query re-sorting, unlike kNN-based estimators, enhancing prediction speed for deployment. The local nature of splines in KAN further ensures that adaptation to sharp density features does not globally inflate complexity.

5. Empirical Evaluation and Applications

Empirical studies demonstrate the framework’s ability to recover true distributions and their moments:

  • Dice simulation: Accurate estimation of the sum distribution and its standard deviation (normalized RMSE ∼4% for mean, ∼14% for standard deviation). The DDR-KAN ensemble outperformed kNN and Bayesian NNs (BNNs) in multi-modality and goodness-of-fit (passing 85/100 tests).
  • Synthetic nonlinear/noisy function: DDR-KAN closely matched Monte Carlo ECDFs, even in regions exhibiting multi-modality and input-dependent distributional changes.
  • Clustering strategies: DDR ensemble consistently captured uncertainty better than randomly clustered ensembles.

These results validate the KAN-based estimator’s suitability for scientific, engineering, and simulation settings where input-dependent, non-Gaussian uncertainties are substantial.

6. Implementation Resources and Practical Deployment

Source codes for the methodology are openly available:

Practitioners may use the full DDR-KAN ensemble or its shallow form depending on data size and complexity. The approach is broadly applicable to regression models seeking efficient, interpretable, and locally adaptive density estimation.

7. Context and Impact

The KAN-based neural density estimation framework introduces a novel route to converting deterministic regressors into probabilistic models without sacrificing computational performance. Its theoretical underpinnings, ensemble-driven uncertainty modeling, and empirical successes position it as an efficient and practical tool for input-dependent, multimodal, and locally varying output densities in data-rich scientific problems. By merging Kolmogorov-Arnold representations and a recursive, interpretable ensemble strategy, this methodology provides a scalable alternative to traditional black-box BNNs and kernel-based estimators, particularly where model size, interpretability, and uncertainty quantification are paramount (Polar et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)