KANHedge: BSDE Solver with Kolmogorov-Arnold Networks
- KANHedge is a BSDE-based solver that leverages Kolmogorov-Arnold Networks with learnable B-spline activations to offer accurate and smooth delta estimations in high-dimensional option pricing and hedging.
- It replaces conventional MLPs in deep BSDE frameworks, reducing pricing errors and hedging risk (CVaR) by up to 9% in empirical studies on European and American basket options.
- By employing spline activations that ensure smooth gradients, KANHedge enhances risk control and mitigates numerical instabilities associated with fixed activation functions in traditional PDE approaches.
KANHedge is a backward stochastic differential equation (BSDE)-based solver for high-dimensional option pricing and hedging. It replaces conventional Multi-Layer Perceptrons (MLPs) in the deep BSDE framework with Kolmogorov-Arnold Networks (KANs), which employ learnable B-spline activation functions. This architecture provides enhanced function approximation capabilities for continuous derivatives, specifically targeting improvements in hedging accuracy and risk control, particularly in high-dimensional settings where standard PDE-based methods are intractable due to the curse of dimensionality (Handal et al., 16 Jan 2026).
1. BSDE Formulation and Hedging Problem
The KANHedge methodology is rooted in the risk-neutral valuation paradigm for option pricing, where the price process and corresponding hedging strategy for a derivative contract are characterized by the BSDE: with adapted to the market filtration and representing the vector of deltas (the number of units of each underlying asset held at time ). Here, is the state process, denotes a -dimensional Brownian motion, is the payoff function, and is the driver specifying market structure.
The classical approach involves discretizing the underlying forward SDE and time grid, approximating and at each time step, and minimizing the expected squared loss on the terminal condition.
2. Deep BSDE Solvers with MLPs: Capabilities and Limitations
Standard deep BSDE solvers discretize the interval into steps and simulate the market paths via the forward SDE: At each time , is modeled as , where the network typically has 3–5 hidden layers of width 100–500 and uses fixed activation functions such as ReLU, tanh, or SiLU. Training minimizes the Monte Carlo approximation of the quadratic terminal loss
where is the number of sampled paths.
Key limitations arise from the use of fixed activations: the resulting function approximations often exhibit irregular or non-smooth gradients, and direct MLP-based estimation of can produce inaccurate or noisy delta trajectories, compromising hedging performance.
3. Kolmogorov–Arnold Networks (KANs) and B-Spline Activations
KANs are motivated by the Kolmogorov–Arnold representation theorem, which asserts that any continuous multivariate function can be expressed as a finite sum of univariate continuous functions: where and are continuous univariate functions.
KAN layers instantiate this by employing learnable B-spline activations for each edge: where are degree- B-spline basis functions and are trainable coefficients, yielding outputs smooth up to order . Each KAN layer maps to via
for . Stacking such layers produces highly expressive multivariate approximators with controlled and smooth derivatives, facilitating improved modeling of option deltas and higher-order Greeks.
4. KANHedge Model Architecture
In KANHedge, every MLP approximator of in the standard deep BSDE solver is replaced by a corresponding KAN: where is an affine transformation, applies univariate spline activations , and is an affine map over activations.
The joint parameter collection includes initial price and all KAN weights. Training minimizes a regularized loss: where and arise from a forward simulation under parameter .
5. Training Protocol
Training employs the Adam optimizer with , , and . The learning rate is initialized in , decaying by a factor of 0.5 every 1,000 epochs. Model fitting utilizes batch sizes of Monte Carlo paths over –$100$ time steps for each trajectory. Spline activations are set to degree with knots. Optimization typically proceeds for 5,000–10,000 epochs until convergence of the loss function.
6. Empirical Results: Option Pricing and Hedging
Empirical studies examine European geometric basket calls at and American arithmetic basket puts at (with as the main baseline). The primary evaluation metrics are:
- Price error:
- Hedging cost at the 95% quantile, normalized by :
Table of representative results:
| Setting | Price Error (MLP) | Price Error (KANHedge) | CVaR MLP | CVaR KANHedge |
|---|---|---|---|---|
| European basket () | 0.31% | 0.055% | 1.438 | 1.409 (2%) |
| American basket () | ≤0.6% | ≤0.37% | 1.66 | 1.52 (8.6%) |
For European baskets (), KANHedge achieves pricing errors ≈0.055% versus ≈0.31% for MLP and reduces CVaR by ≈2.01%. Under high volatility or out-of-the-money conditions, MLP and KANHedge exhibit CVaR values around 1.90 and 1.88, respectively (a 1–4% improvement). For American baskets (), KANHedge achieves ≈8.6% lower CVaR. Across all strike, correlation, and volatility combinations, KANHedge consistently reduces hedging risk cost (CVaR) by 4–9% relative to MLPs.
7. Analysis, Advantages, and Limitations
The use of B-spline activations in KANs leads to smoother output with control of derivatives up to the second order, mitigating the occurrence of "gamma spikes" in the hedging profile. The Kolmogorov–Arnold decomposition directs the network to learn univariate transforms, which yields well-behaved partial derivatives and thus more accurate delta estimation. Directly modeling using KAN further enhances the alignment of model-predicted deltas with analytical references, as demonstrated by near-perfect overlap with the Black-Scholes delta in the single-asset case, while MLP-based deltas deviate notably in the tails.
KAN layers introduce additional parameters per spline basis, incurring a memory and computational overhead of approximately 20–50% per training epoch. Hyperparameters such as knot placement and spline degree require careful tuning for optimal results.
Potential extensions include joint delta–gamma hedging by leveraging second derivatives of KAN outputs, integration of additional risk factors (e.g., Cox-Ingersoll-Ross stochastic interest rates), and multi-output KAN architectures for portfolio hedging with multiple payoff structures (Handal et al., 16 Jan 2026).