Papers
Topics
Authors
Recent
2000 character limit reached

KANHedge: BSDE Solver with Kolmogorov-Arnold Networks

Updated 19 January 2026
  • KANHedge is a BSDE-based solver that leverages Kolmogorov-Arnold Networks with learnable B-spline activations to offer accurate and smooth delta estimations in high-dimensional option pricing and hedging.
  • It replaces conventional MLPs in deep BSDE frameworks, reducing pricing errors and hedging risk (CVaR) by up to 9% in empirical studies on European and American basket options.
  • By employing spline activations that ensure smooth gradients, KANHedge enhances risk control and mitigates numerical instabilities associated with fixed activation functions in traditional PDE approaches.

KANHedge is a backward stochastic differential equation (BSDE)-based solver for high-dimensional option pricing and hedging. It replaces conventional Multi-Layer Perceptrons (MLPs) in the deep BSDE framework with Kolmogorov-Arnold Networks (KANs), which employ learnable B-spline activation functions. This architecture provides enhanced function approximation capabilities for continuous derivatives, specifically targeting improvements in hedging accuracy and risk control, particularly in high-dimensional settings where standard PDE-based methods are intractable due to the curse of dimensionality (Handal et al., 16 Jan 2026).

1. BSDE Formulation and Hedging Problem

The KANHedge methodology is rooted in the risk-neutral valuation paradigm for option pricing, where the price process YtY_t and corresponding hedging strategy ZtZ_t for a derivative contract are characterized by the BSDE: Yt=g(XT)+tTf(s,Xs,Ys,Zs)dstTZsdWs,Y_t = g(X_T) + \int_t^T f(s, X_s, Y_s, Z_s)\, ds - \int_t^T Z_s^\top \, dW_s\,, with YtY_t adapted to the market filtration and ZtRdZ_t \in \mathbb{R}^d representing the vector of deltas (the number of units of each underlying asset held at time tt). Here, XtX_t is the state process, WtW_t denotes a dd-dimensional Brownian motion, gg is the payoff function, and ff is the driver specifying market structure.

The classical approach involves discretizing the underlying forward SDE and time grid, approximating YtY_t and ZtZ_t at each time step, and minimizing the expected squared loss on the terminal condition.

2. Deep BSDE Solvers with MLPs: Capabilities and Limitations

Standard deep BSDE solvers discretize the interval [0,T][0, T] into NN steps and simulate the market paths via the forward SDE: Xtn+1=Xtn+μ(Xtn,tn)Δt+Γ(Xtn,tn)ΔWn.X_{t_{n+1}} = X_{t_n} + \mu(X_{t_n}, t_n) \Delta t + \Gamma(X_{t_n}, t_n) \Delta W_n. At each time tnt_n, ZtnZ_{t_n} is modeled as MLPθn(tn,Xtn)\mathrm{MLP}_{\theta_n}(t_n, X_{t_n}), where the network typically has 3–5 hidden layers of width 100–500 and uses fixed activation functions such as ReLU, tanh, or SiLU. Training minimizes the Monte Carlo approximation of the quadratic terminal loss

L(Θ)=E[(YTΘg(XT))2]1Mi=1M(YTΘ,(i)g(XT(i)))2,\mathcal{L}(\Theta) = \mathbb{E}[(Y_T^\Theta - g(X_T))^2] \approx \frac{1}{M}\sum_{i=1}^M (Y_T^{\Theta, (i)} - g(X_T^{(i)}))^2,

where MM is the number of sampled paths.

Key limitations arise from the use of fixed activations: the resulting function approximations often exhibit irregular or non-smooth gradients, and direct MLP-based estimation of ZtZ_t can produce inaccurate or noisy delta trajectories, compromising hedging performance.

3. Kolmogorov–Arnold Networks (KANs) and B-Spline Activations

KANs are motivated by the Kolmogorov–Arnold representation theorem, which asserts that any continuous multivariate function can be expressed as a finite sum of univariate continuous functions: f(x1,,xd)=q=02dAq(p=1dΦq,p(xp)),f(x_1, \dots, x_d) = \sum_{q=0}^{2d} A_q \left( \sum_{p=1}^d \Phi_{q,p}(x_p) \right), where AqA_q and Φq,p\Phi_{q,p} are continuous univariate functions.

KAN layers instantiate this by employing learnable B-spline activations for each edge: ϕ(u)=k=0KckBk,p(u),\phi(u) = \sum_{k=0}^K c_k B_{k,p}(u), where Bk,pB_{k,p} are degree-pp B-spline basis functions and ckc_k are trainable coefficients, yielding outputs smooth up to order p1p-1. Each KAN layer maps xRninx \in \mathbb{R}^{n_{\mathrm{in}}} to yRnouty \in \mathbb{R}^{n_{\mathrm{out}}} via

yj=i=1ninwj,iϕj,i(xi)+bj,y_j = \sum_{i=1}^{n_{\mathrm{in}}} w_{j,i} \, \phi_{j,i}(x_i) + b_j,

for j=1,,noutj=1, \dots, n_{\mathrm{out}}. Stacking such layers produces highly expressive multivariate approximators with controlled and smooth derivatives, facilitating improved modeling of option deltas and higher-order Greeks.

4. KANHedge Model Architecture

In KANHedge, every MLP approximator of ZtnZ_{t_n} in the standard deep BSDE solver is replaced by a corresponding KAN: ZtnΨθn(tn,Xtn)=A(n)(Φ(n)(B(n)[tn,Xtn]+b(n))),Z_{t_n} \approx \Psi_{\theta_n}(t_n, X_{t_n}) = A^{(n)}\left(\Phi^{(n)}(B^{(n)}[t_n, X_{t_n}] + b^{(n)})\right), where B(n)B^{(n)} is an affine transformation, Φ(n)\Phi^{(n)} applies univariate spline activations ϕj,i\phi_{j,i}, and A(n)A^{(n)} is an affine map over activations.

The joint parameter collection θ\theta includes initial price Y0=u0Y_0 = u_0 and all KAN weights. Training minimizes a regularized loss: L(θ)=E[Y0Y0θ2+λn=0N1ZtnZtnθ2],L(\theta) = \mathbb{E}\Big[ |Y_0 - Y_0^\theta|^2 + \lambda \sum_{n=0}^{N-1} \| Z_{t_n} - Z_{t_n}^\theta \|^2 \Big], where Y0θY_0^\theta and ZtnθZ_{t_n}^\theta arise from a forward simulation under parameter θ\theta.

5. Training Protocol

Training employs the Adam optimizer with β1=0.9\beta_1 = 0.9, β2=0.999\beta_2 = 0.999, and ϵ=108\epsilon = 10^{-8}. The learning rate is initialized in [104,103][10^{-4}, 10^{-3}], decaying by a factor of 0.5 every 1,000 epochs. Model fitting utilizes batch sizes of M=2,048M=2,048 Monte Carlo paths over N=50N=50–$100$ time steps for each trajectory. Spline activations are set to degree p=3p=3 with K+1=10K+1=10 knots. Optimization typically proceeds for 5,000–10,000 epochs until convergence of the loss function.

6. Empirical Results: Option Pricing and Hedging

Empirical studies examine European geometric basket calls at d=10,50,100d=10, 50, 100 and American arithmetic basket puts at d=8,20d=8, 20 (with d=8d=8 as the main baseline). The primary evaluation metrics are:

  • Price error:

PriceError=100u0modelu0refu0ref\operatorname{PriceError} = 100 \, \frac{|u_0^{\mathrm{model}} - u_0^{\mathrm{ref}}|}{u_0^{\mathrm{ref}}}

  • Hedging cost CVaR0.95\mathrm{CVaR}_{0.95} at the 95% quantile, normalized by u0ref|u_0^{\mathrm{ref}}|:

CVaR0.95=E[CCVaR0.95]u0ref\mathrm{CVaR}_{0.95} = \frac{\mathbb{E}[ C \mid C \ge \mathrm{VaR}_{0.95} ]}{|u_0^{\mathrm{ref}}|}

Table of representative results:

Setting Price Error (MLP) Price Error (KANHedge) CVaR MLP CVaR KANHedge
European basket (d=10d=10) 0.31% 0.055% 1.438 1.409 (2%)
American basket (d=8d=8) ≤0.6% ≤0.37% 1.66 1.52 (8.6%)

For European baskets (d=10d=10), KANHedge achieves pricing errors ≈0.055% versus ≈0.31% for MLP and reduces CVaR by ≈2.01%. Under high volatility or out-of-the-money conditions, MLP and KANHedge exhibit CVaR values around 1.90 and 1.88, respectively (a 1–4% improvement). For American baskets (d=8d=8), KANHedge achieves ≈8.6% lower CVaR. Across all strike, correlation, and volatility combinations, KANHedge consistently reduces hedging risk cost (CVaR) by 4–9% relative to MLPs.

7. Analysis, Advantages, and Limitations

The use of B-spline activations in KANs leads to smoother output with control of derivatives up to the second order, mitigating the occurrence of "gamma spikes" in the hedging profile. The Kolmogorov–Arnold decomposition directs the network to learn univariate transforms, which yields well-behaved partial derivatives and thus more accurate delta estimation. Directly modeling ZtZ_t using KAN further enhances the alignment of model-predicted deltas with analytical references, as demonstrated by near-perfect overlap with the Black-Scholes delta in the single-asset case, while MLP-based deltas deviate notably in the tails.

KAN layers introduce additional parameters per spline basis, incurring a memory and computational overhead of approximately 20–50% per training epoch. Hyperparameters such as knot placement and spline degree pp require careful tuning for optimal results.

Potential extensions include joint delta–gamma hedging by leveraging second derivatives of KAN outputs, integration of additional risk factors (e.g., Cox-Ingersoll-Ross stochastic interest rates), and multi-output KAN architectures for portfolio hedging with multiple payoff structures (Handal et al., 16 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to KANHedge.