Papers
Topics
Authors
Recent
Search
2000 character limit reached

Input-Convex KANs (ICKANs)

Updated 5 March 2026
  • Input-Convex KANs (ICKANs) are neural network architectures designed to represent multivariate convex functions by merging Kolmogorov–Arnold decompositions with convexity-preserving constraints.
  • They employ diverse variants—max/LogSumExp, piecewise-linear, cubic, and B-spline formulations—to enforce convexity and offer universal approximation with proven convergence rates.
  • Practical applications include options pricing, optimal transport, and constitutive modeling in physics, where ICKANs achieve high accuracy and enhanced interpretability.

Input-Convex Kolmogorov–Arnold Networks (ICKANs) constitute a recent family of neural network architectures engineered to represent multivariate convex functions, integrating the classical Kolmogorov–Arnold representation with convexity-preserving mechanisms. These models are motivated by applications in scientific computing, mathematical finance, and engineering, where the target functions are known or required to be convex. Modern ICKANs offer both universal approximation guarantees for convex functions and practical advantages in problems such as options pricing, optimal transport, and learning polyconvex constitutive laws in mechanics (Lemaire et al., 2024, Thakolkaran et al., 7 Mar 2025, Deschatre et al., 27 May 2025).

1. Mathematical Foundations

ICKANs are constructed upon the interplay of convex analysis and the Kolmogorov–Arnold superposition theorem. For a convex function f:RdRf:\mathbb{R}^d\to\mathbb{R}, the supremum-of-affine property specifies that

f(x)=sup{w,x+b:(w,b)Rd×R, w,y+bf(y), yRd}.f(x)=\sup\left\{\langle w,x\rangle + b: (w, b)\in\mathbb{R}^d\times\mathbb{R},~ \langle w, y\rangle + b \leq f(y),~ \forall y\in\mathbb{R}^d \right\}.

This admits the parametric family

fn,1(x)=max1in(wi,x+bi),f_{n,1}(x) = \max_{1\leq i\leq n} (\langle w_i, x\rangle + b_i),

which is a neural network with a one-layer "max" output activation, inherently convex in input.

Kolmogorov–Arnold representations decompose any continuous multivariate function f:[a,b]nRf:[a,b]^n\to\mathbb{R} as sums of outer univariate functions applied to sums of inner univariate functions: f(x)=i=12n+1ψi(j=1nΦi,j(xj)).f(x) = \sum_{i=1}^{2n+1} \psi_i\left( \sum_{j=1}^n \Phi_{i,j}(x_j) \right). In ICKANs, the ψ\psi are chosen to be convex univariate approximators; the composition and summation are arranged to guarantee that convexity is preserved through the architecture (Deschatre et al., 27 May 2025, Thakolkaran et al., 7 Mar 2025).

2. Architectural Variants

ICKAN architectures fall into three principal types, each enforcing convexity by design:

  • Supremum-of-affine (“max”) networks: These rely on stacking linear (affine) layers, followed by a convex aggregation (max or LogSumExp). The network

fn,L(x)=ϕ(WLWL1W1x+k=1LWLWk+1bk),f_{n,L}(x) = \phi \left( W_L W_{L-1} \cdots W_1 x + \sum_{k=1}^L W_L \cdots W_{k+1} b_k \right),

with ϕ(z)=maxizi\phi(z) = \max_i z_i or the smoothed LogSumExp output, remains convex in xx for any linear weights (Lemaire et al., 2024).

  • KAN-based, basis-expansion architectures: A deep KAN layer takes the form

zi(r)=j=1nr1ϕr1,i,j(zj(r1)),z^{(r)}_i = \sum_{j=1}^{n_{r-1}} \phi_{r-1, i, j}\left( z^{(r-1)}_j \right),

where each ϕr1,i,j\phi_{r-1, i, j} is implemented as a trainable convex univariate function via either (a) piecewise-linear (P1) expansions or (b) Hermite cubic splines (Deschatre et al., 27 May 2025).

  • Spline-augmented convex layers for physics: For polyconvex constitutive laws, W(F)W(F) is constructed by feeding polyconvex invariants (e.g., in hyperelasticity, K1,K2,K3K_1, K_2, K_3) through a KAN with convex and monotone univariate B-spline activation functions, ensuring physical admissibility (Thakolkaran et al., 7 Mar 2025).

The table below summarizes key characteristics of the main ICKAN instantiations:

Variant Univariate Basis Convexity Control
Max/LogSumExp Identity (affine) Aggregation (max, LSE)
P1-ICKAN Piecewise-linear (P1) Nondecreasing slope params
Cubic-ICKAN Hermite cubic splines Convexity-band via sigmoids
B-spline ICKAN Uniform B-splines Linear constraints on cmc_m

Convexity is enforced by parameterizing the univariate basis expansions to guarantee nondecreasing (or monotone-convex) behavior, as detailed in (Deschatre et al., 27 May 2025, Thakolkaran et al., 7 Mar 2025).

3. Convexity Enforcement and Theoretical Guarantees

For all ICKAN architectures, convexity with respect to input is secured by:

  • Ensuring nonnegative increments for slope parameters in piecewise-linear and cubic spline basis functions;
  • Linear constraints on B-spline control points, enforcing monotonicity and convexity directly;
  • Composition rules for convex and nondecreasing univariate functions, ensuring global convexity under summation and layer-stacking (Deschatre et al., 27 May 2025).

Regarding expressivity:

  • Universal approximation: P1-ICKANs (adaptive piecewise-linear) with sufficient mesh density and layer width are dense in the set of Lipschitz convex functions on compact sets [(Deschatre et al., 27 May 2025), Theorem 2.1]. Supremum-of-affine ICKANs provide O(n2/d)O(n^{-2/d}) uniform approximation rates for C1C^1 convex functions on dd-dimensional cubes and similar rates in LrL^r norms for Hölder-smooth gradients (Lemaire et al., 2024).
  • Higher-order splines: Cubic and B-spline ICKANs achieve smoother gradient approximation and improved empirical convergence, although formal universal approximation results remain an open problem for the cubic variant (Deschatre et al., 27 May 2025).

4. Training Methodologies

  • Loss: Standard supervised regression objectives, such as mean squared error,

L(θ)=1Ni=1Nfθ(x(i))y(i)2,\mathcal{L}(\theta) = \frac{1}{N} \sum_{i=1}^N |f_\theta(x^{(i)}) - y^{(i)}|^2,

are used for regression tasks. In applications like physics-informed learning, losses are constructed from PDE weak forms and measured reaction forces (Thakolkaran et al., 7 Mar 2025).

  • Optimizer: Adam with learning rates 103\sim 10^{-3} is used commonly. For architectures with parameter redundancy ("scrambling"), multiple (deep) affine layers are stacked to improve optimization by flattening saddle points and accelerating convergence (Lemaire et al., 2024).
  • Regularization: Convexity is enforced structurally by parameter constraints, making additional explicit regularization optional. Grid adaptation is achieved via trainable mesh points, enhancing local resolution (Deschatre et al., 27 May 2025).
  • Symbolic Extraction: In physical modeling (e.g., hyperelasticity), learned spline functions can be approximated post-training by convex symbolic functions from a restricted library, providing interpretability and closed-form expressions (Thakolkaran et al., 7 Mar 2025).

5. Empirical Benchmarks and Applications

ICKANs have been evaluated in diverse contexts:

  • Options Pricing: Applied to basket, Bermudan, and swing contracts under Black–Scholes and Ornstein–Uhlenbeck dynamics, ICKANs achieve benchmark-level pricing accuracy (≤0.1% relative error for basket options; ≤1% for Bermudan; ≤0.3% for swing), matching or exceeding state-of-the-art (Deep Optimal Stopping, neural stratification) (Lemaire et al., 2024).
  • Multimodal Convex Regression: Toy and synthetic benchmarks (1D, multivariate, mixed smooth/non-smooth) show that both P1- and cubic ICKANs reach mean squared errors comparable to or better than classical ICNNs, with smoother gradient approximation for cubic splines (Deschatre et al., 27 May 2025).
  • Optimal Transport: In semi-dual potential parameterizations, cubic-ICKANs achieve unexplained variance percentages on distributional benchmarks (Gaussian mixtures, tensorized maps) matching or outperforming ICNNs, especially in settings with separable structure (Deschatre et al., 27 May 2025).
  • Physical Constitutive Modeling: B-spline-based monotonic ICKANs model polyconvex isotropic hyperelastic energy densities, trained unsupervised from full-field (DIC-style) data and global force measurements. ICKANs achieve R2>0.99R^2>0.99 fit to ground-truth invariants and produce interpretable symbolic constitutive relationships (Thakolkaran et al., 7 Mar 2025).

The table below encapsulates performance domains:

Application ICKAN Variant Key Result
Options Pricing Max/LSE, scrambled ≤0.1–1% rel. error
Convex Regression (1D, multivar) P1-/Cubic-ICKAN MSE \simICNN, smoother \nabla
Optimal Transport Cubic-ICKAN UVP% \leq ICNN, d ≤ 32
Polyconvex Constitutive Modeling B-spline ICKAN R2>0.99R^2>0.99, closed-form W(F)W(F)

6. Interpretability, Limitations, and Open Directions

ICKANs provide increased architectural interpretability over dense ICNNs in problems where symbolic post-processing is feasible. The learned univariate convex functions are directly visualizable and can be symbolically regressed to closed-form (e.g., linear, exp\exp, or log(1+exp)\log(1+\exp) terms), yielding transparent connections to classical models (Thakolkaran et al., 7 Mar 2025).

Notable limitations include:

  • Computational overhead due to basis expansions and grid adaptation (P1-ICKANs are ∼4.3× slower per epoch than ICNNs on similar benchmarks) (Deschatre et al., 27 May 2025).
  • Hyperparameter tuning for spline order, mesh density, and architecture depth remains manual and nontrivial.
  • Theoretical convergence guarantees are firm only for low-order (P1) variants; higher-order cubic and spline-based variants lack general proofs.
  • Extrapolation outside spline-supported input domains relies on linear extension, which may not reflect complex behaviors in extreme regimes.
  • Symbolic regression is constrained by the expressive power of the chosen function library.

Extending ICKANs to handle viscoelasticity, plasticity, large-deformation histories, high-dimensional optimal transport, and incorporating UQ (Bayesian ICKANs) are identified as key open research directions (Thakolkaran et al., 7 Mar 2025, Deschatre et al., 27 May 2025).

7. Comparative Perspective and Outlook

ICKANs substantially broaden the toolkit for learning convex functions. They combine the dimension-wise expressivity of Kolmogorov–Arnold decompositions with robust, parameter-manifold-enforced convexity guarantees. Benchmark results indicate that ICKANs are competitive with, and sometimes outperform, traditional ICNNs, particularly in regimes favoring separability or structured input domains. Their ability to yield compact, interpretable, and physically admissible representations recommends them for applications in which convexity and physical structure are both critical (Lemaire et al., 2024, Thakolkaran et al., 7 Mar 2025, Deschatre et al., 27 May 2025).

As the field develops, advances in efficient spline evaluation, automated grid adaptation, and theoretical understanding of higher-order convex approximators will likely drive further progress in both theory and applications.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Input-Convex KANs (ICKANs).