Input-Convex KANs (ICKANs)
- Input-Convex KANs (ICKANs) are neural network architectures designed to represent multivariate convex functions by merging Kolmogorov–Arnold decompositions with convexity-preserving constraints.
- They employ diverse variants—max/LogSumExp, piecewise-linear, cubic, and B-spline formulations—to enforce convexity and offer universal approximation with proven convergence rates.
- Practical applications include options pricing, optimal transport, and constitutive modeling in physics, where ICKANs achieve high accuracy and enhanced interpretability.
Input-Convex Kolmogorov–Arnold Networks (ICKANs) constitute a recent family of neural network architectures engineered to represent multivariate convex functions, integrating the classical Kolmogorov–Arnold representation with convexity-preserving mechanisms. These models are motivated by applications in scientific computing, mathematical finance, and engineering, where the target functions are known or required to be convex. Modern ICKANs offer both universal approximation guarantees for convex functions and practical advantages in problems such as options pricing, optimal transport, and learning polyconvex constitutive laws in mechanics (Lemaire et al., 2024, Thakolkaran et al., 7 Mar 2025, Deschatre et al., 27 May 2025).
1. Mathematical Foundations
ICKANs are constructed upon the interplay of convex analysis and the Kolmogorov–Arnold superposition theorem. For a convex function , the supremum-of-affine property specifies that
This admits the parametric family
which is a neural network with a one-layer "max" output activation, inherently convex in input.
Kolmogorov–Arnold representations decompose any continuous multivariate function as sums of outer univariate functions applied to sums of inner univariate functions: In ICKANs, the are chosen to be convex univariate approximators; the composition and summation are arranged to guarantee that convexity is preserved through the architecture (Deschatre et al., 27 May 2025, Thakolkaran et al., 7 Mar 2025).
2. Architectural Variants
ICKAN architectures fall into three principal types, each enforcing convexity by design:
- Supremum-of-affine (“max”) networks: These rely on stacking linear (affine) layers, followed by a convex aggregation (max or LogSumExp). The network
with or the smoothed LogSumExp output, remains convex in for any linear weights (Lemaire et al., 2024).
- KAN-based, basis-expansion architectures: A deep KAN layer takes the form
where each is implemented as a trainable convex univariate function via either (a) piecewise-linear (P1) expansions or (b) Hermite cubic splines (Deschatre et al., 27 May 2025).
- Spline-augmented convex layers for physics: For polyconvex constitutive laws, is constructed by feeding polyconvex invariants (e.g., in hyperelasticity, ) through a KAN with convex and monotone univariate B-spline activation functions, ensuring physical admissibility (Thakolkaran et al., 7 Mar 2025).
The table below summarizes key characteristics of the main ICKAN instantiations:
| Variant | Univariate Basis | Convexity Control |
|---|---|---|
| Max/LogSumExp | Identity (affine) | Aggregation (max, LSE) |
| P1-ICKAN | Piecewise-linear (P1) | Nondecreasing slope params |
| Cubic-ICKAN | Hermite cubic splines | Convexity-band via sigmoids |
| B-spline ICKAN | Uniform B-splines | Linear constraints on |
Convexity is enforced by parameterizing the univariate basis expansions to guarantee nondecreasing (or monotone-convex) behavior, as detailed in (Deschatre et al., 27 May 2025, Thakolkaran et al., 7 Mar 2025).
3. Convexity Enforcement and Theoretical Guarantees
For all ICKAN architectures, convexity with respect to input is secured by:
- Ensuring nonnegative increments for slope parameters in piecewise-linear and cubic spline basis functions;
- Linear constraints on B-spline control points, enforcing monotonicity and convexity directly;
- Composition rules for convex and nondecreasing univariate functions, ensuring global convexity under summation and layer-stacking (Deschatre et al., 27 May 2025).
Regarding expressivity:
- Universal approximation: P1-ICKANs (adaptive piecewise-linear) with sufficient mesh density and layer width are dense in the set of Lipschitz convex functions on compact sets [(Deschatre et al., 27 May 2025), Theorem 2.1]. Supremum-of-affine ICKANs provide uniform approximation rates for convex functions on -dimensional cubes and similar rates in norms for Hölder-smooth gradients (Lemaire et al., 2024).
- Higher-order splines: Cubic and B-spline ICKANs achieve smoother gradient approximation and improved empirical convergence, although formal universal approximation results remain an open problem for the cubic variant (Deschatre et al., 27 May 2025).
4. Training Methodologies
- Loss: Standard supervised regression objectives, such as mean squared error,
are used for regression tasks. In applications like physics-informed learning, losses are constructed from PDE weak forms and measured reaction forces (Thakolkaran et al., 7 Mar 2025).
- Optimizer: Adam with learning rates is used commonly. For architectures with parameter redundancy ("scrambling"), multiple (deep) affine layers are stacked to improve optimization by flattening saddle points and accelerating convergence (Lemaire et al., 2024).
- Regularization: Convexity is enforced structurally by parameter constraints, making additional explicit regularization optional. Grid adaptation is achieved via trainable mesh points, enhancing local resolution (Deschatre et al., 27 May 2025).
- Symbolic Extraction: In physical modeling (e.g., hyperelasticity), learned spline functions can be approximated post-training by convex symbolic functions from a restricted library, providing interpretability and closed-form expressions (Thakolkaran et al., 7 Mar 2025).
5. Empirical Benchmarks and Applications
ICKANs have been evaluated in diverse contexts:
- Options Pricing: Applied to basket, Bermudan, and swing contracts under Black–Scholes and Ornstein–Uhlenbeck dynamics, ICKANs achieve benchmark-level pricing accuracy (≤0.1% relative error for basket options; ≤1% for Bermudan; ≤0.3% for swing), matching or exceeding state-of-the-art (Deep Optimal Stopping, neural stratification) (Lemaire et al., 2024).
- Multimodal Convex Regression: Toy and synthetic benchmarks (1D, multivariate, mixed smooth/non-smooth) show that both P1- and cubic ICKANs reach mean squared errors comparable to or better than classical ICNNs, with smoother gradient approximation for cubic splines (Deschatre et al., 27 May 2025).
- Optimal Transport: In semi-dual potential parameterizations, cubic-ICKANs achieve unexplained variance percentages on distributional benchmarks (Gaussian mixtures, tensorized maps) matching or outperforming ICNNs, especially in settings with separable structure (Deschatre et al., 27 May 2025).
- Physical Constitutive Modeling: B-spline-based monotonic ICKANs model polyconvex isotropic hyperelastic energy densities, trained unsupervised from full-field (DIC-style) data and global force measurements. ICKANs achieve fit to ground-truth invariants and produce interpretable symbolic constitutive relationships (Thakolkaran et al., 7 Mar 2025).
The table below encapsulates performance domains:
| Application | ICKAN Variant | Key Result |
|---|---|---|
| Options Pricing | Max/LSE, scrambled | ≤0.1–1% rel. error |
| Convex Regression (1D, multivar) | P1-/Cubic-ICKAN | MSE ICNN, smoother |
| Optimal Transport | Cubic-ICKAN | UVP% ICNN, d ≤ 32 |
| Polyconvex Constitutive Modeling | B-spline ICKAN | , closed-form |
6. Interpretability, Limitations, and Open Directions
ICKANs provide increased architectural interpretability over dense ICNNs in problems where symbolic post-processing is feasible. The learned univariate convex functions are directly visualizable and can be symbolically regressed to closed-form (e.g., linear, , or terms), yielding transparent connections to classical models (Thakolkaran et al., 7 Mar 2025).
Notable limitations include:
- Computational overhead due to basis expansions and grid adaptation (P1-ICKANs are ∼4.3× slower per epoch than ICNNs on similar benchmarks) (Deschatre et al., 27 May 2025).
- Hyperparameter tuning for spline order, mesh density, and architecture depth remains manual and nontrivial.
- Theoretical convergence guarantees are firm only for low-order (P1) variants; higher-order cubic and spline-based variants lack general proofs.
- Extrapolation outside spline-supported input domains relies on linear extension, which may not reflect complex behaviors in extreme regimes.
- Symbolic regression is constrained by the expressive power of the chosen function library.
Extending ICKANs to handle viscoelasticity, plasticity, large-deformation histories, high-dimensional optimal transport, and incorporating UQ (Bayesian ICKANs) are identified as key open research directions (Thakolkaran et al., 7 Mar 2025, Deschatre et al., 27 May 2025).
7. Comparative Perspective and Outlook
ICKANs substantially broaden the toolkit for learning convex functions. They combine the dimension-wise expressivity of Kolmogorov–Arnold decompositions with robust, parameter-manifold-enforced convexity guarantees. Benchmark results indicate that ICKANs are competitive with, and sometimes outperform, traditional ICNNs, particularly in regimes favoring separability or structured input domains. Their ability to yield compact, interpretable, and physically admissible representations recommends them for applications in which convexity and physical structure are both critical (Lemaire et al., 2024, Thakolkaran et al., 7 Mar 2025, Deschatre et al., 27 May 2025).
As the field develops, advances in efficient spline evaluation, automated grid adaptation, and theoretical understanding of higher-order convex approximators will likely drive further progress in both theory and applications.