CoxKAN Survival Analysis Model

Updated 24 December 2025

CoxKAN is a survival analysis model that uses Kolmogorov–Arnold Networks to parameterize the log-partial hazard function for rich, nonlinear risk modeling.
It employs a compact network architecture with learnable univariate functions and B-spline bases, balancing expressivity and clear symbolic interpretability.
Empirical evaluations show that CoxKAN outperforms classical Cox models and competes with deep neural networks by revealing biologically plausible nonlinear interactions in clinical and genomics data.

CoxKAN is a survival analysis model that leverages Kolmogorov–Arnold Networks (KANs) to provide a high-performance, interpretable alternative to traditional and deep learning-based survival models. CoxKAN directly parameterizes the log-partial hazard function of the Cox proportional hazards model using the compositional Kolmogorov–Arnold representation, allowing for rich, nonlinear modeling while maintaining explicit symbolic interpretability and performing inherent feature selection. Empirical studies show that CoxKAN consistently outperforms classical Cox proportional hazards models and is competitive with, or superior to, state-of-the-art deep neural network methods, especially in discovering complex multivariate dependencies in clinical and high-dimensional genomics data (Knottenbelt et al., 2024).

1. Mathematical Foundations and Model Definition

CoxKAN models the hazard function in the Cox proportional hazards framework as follows: $h_{\text{CoxKAN}}(t\,|\,x) = h_0(t) \exp( \theta(x) )$ where $h_0(t)$ is the baseline hazard and $\theta(x) = \text{KAN}(x)$ is a real-valued function learned by a Kolmogorov–Arnold Network, mapping covariates $x \in \mathbb{R}^D$ to a log-risk score. The baseline hazard is unspecified and handled via the partial likelihood framework inherent to the Cox model; the learning focuses wholly on the nonparametric risk function $\theta(x)$ .

The Kolmogorov–Arnold representation used in CoxKAN is: $\theta(x) = \sum_{q=1}^M \phi_q\left( \sum_{p=1}^D \psi_{q,p}(x_p) \right)$ where the $\psi_{q,p} : \mathbb{R} \to \mathbb{R}$ are univariate “inner” functions (one per feature per neuron) and the $\phi_q : \mathbb{R} \to \mathbb{R}$ are univariate “outer” functions. The summation structure ensures universal approximation capability for continuous multivariate functions as per the Kolmogorov–Arnold theorem (Knottenbelt et al., 2024).

2. Network Architecture and Parameterization

CoxKAN typically employs a compact architecture (often one or two hidden layers) where each connection is a learnable univariate function rather than a scalar weight. The canonical (single hidden layer) architecture is:

Inputs: $D$ features
Hidden: $M$ units, each receiving all $D$ features via $M \times D$ parallel univariate inner functions $\psi_{q,p}$
Output: $\theta(x)$ as a sum over $M$ outer univariate functions $\phi_q$

Each univariate function (inner or outer) is parameterized by a small B-spline basis plus an optional residual basis term: $\varphi(x) = w_b\,b(x) + w_s \sum_{i=0}^{G+k-1} c_i\,B_{i,k}(x)$ where $B_{i,k}$ are degree- $k$ B-spline basis functions, $c_i$ are trainable coefficients, and $w_b$ , $w_s$ are trainable scalars. By using low-order splines (typically $k=3$ ) on a small grid ( $G=3$ –5), the architecture achieves a balance between expressivity and interpretability (Knottenbelt et al., 2024).

3. Training Objective, Regularization, and Feature Selection

CoxKAN is trained to minimize the regularized negative partial Cox log-likelihood: $\ell_{\rm total} = \ell_{\rm Cox} + \lambda R$ where

$\ell_{\rm Cox} = -\sum_{i:\, \delta_i=1} \left[ \text{KAN}(x_i) - \log \sum_{j \in \mathcal{R}(t_i)} \exp( \text{KAN}(x_j) ) \right]$

and $\mathcal{R}(t_i) = \{ j : t_j \geq t_i \}$ is the risk set for event time $t_i$ .

The regularizer $R$ combines:

$\ell_1$ -norm of activation magnitudes (encourages edge/neuron sparsity)
Entropy of activation magnitudes (promotes focused sparse connectivity)
$\ell_1$ -norm on spline coefficients (encourages function simplicity)

Optimization is performed using the Adam algorithm with early stopping based on validation concordance index (C-Index). After training, a threshold parameter $\tau$ prunes low-activation edges and neurons, resulting in automatic feature selection and topology simplification (Knottenbelt et al., 2024).

4. Symbolic Formula Extraction and Interpretability

After pruning, each univariate function $\varphi(x)$ is fitted to a small symbolic template: $\hat{\varphi}(x) = c\,f(ax+b) + d, \quad f \in \{ \sin, \exp, \tanh, \arctan, x^n, \ldots \}$ The best-fitting template is chosen by maximizing $R^2$ over empirical activations. If no template matches ( $R^2 < 0.99$ ), symbolic regression tools such as PySR are invoked to recover a closed-form expression.

The final risk score $\theta(x)$ is the sum of these explicitly discovered symbolic curves. This approach provides direct insight into both the overall hazard model and the effect of individual covariates or interactions, differentiating CoxKAN from “black-box” neural competitors (Knottenbelt et al., 2024).

5. Empirical Evaluation and Benchmarking

CoxKAN was evaluated on four synthetic datasets (where ground-truth hazard formulas were known) and nine real-world datasets (comprising five standard clinical and four high-dimensional genomics cohorts). Performance was measured by the Harrell C-Index and, when applicable, the Integrated Brier Score.

Dataset Type	Comparator Models	CoxKAN Performance
Synthetic	CoxPH, DeepSurv	Matches/exceeds true hazard in 3/4 cases
Clinical	CoxPH, DeepSurv	Outperforms CoxPH, matches/exceeds DeepSurv on 4/5
Genomics (TCGA)	CoxPH+Lasso, DeepSurv	Competitive with CoxPH+Lasso; beats DeepSurv on 2/4

On synthetic benchmarks, CoxKAN exactly recovered the generating hazard function when expressible by the model. On clinical datasets, CoxKAN symbolic models achieved higher or comparable C-Index versus CoxPH and DeepSurv, with non-overlapping confidence intervals in several cases. In high-dimensional genomics, CoxKAN remained robust where unregularized CoxPH failed due to multicollinearity (Knottenbelt et al., 2024).

6. Discovery of Nonlinear Interactions and Biological Plausibility

CoxKAN demonstrated a unique capacity to discover and symbolize previously unrecognized nonlinear and interaction effects among covariates. For instance, in the SUPPORT dataset, the learned interaction subnetworks between age and metastatic cancer status revealed biologically plausible, cohort-specific risk trajectories. In the GBSG breast cancer dataset, CoxKAN rediscovered nonlinear “sweet-spot” biomarker effects, and in high-dimensional glioma genomics data, it uncovered clear genetic prognostic signatures matching known molecular pathology (Knottenbelt et al., 2024).

7. Practical Implementation and Usage Workflow

CoxKAN’s practical usage involves:

Selecting the KAN architecture and regularization strength.
Training the network using the regularized Cox partial-likelihood with early stopping.
Pruning low-activation edges to yield a minimal feature set.
Running symbolic fitting or symbolic regression on the remaining activations to produce a final, human-readable hazard model.

This enables practitioners to derive a sparse, accurate, and interpretable survival model that aligns with regulatory and scientific requirements for transparency in biomedical applications (Knottenbelt et al., 2024).

This summary synthesizes results and methodologies as presented in "CoxKAN: Kolmogorov-Arnold Networks for Interpretable, High-Performance Survival Analysis" (Knottenbelt et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

CoxKAN: Kolmogorov-Arnold Networks for Interpretable, High-Performance Survival Analysis (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CoxKAN.

CoxKAN Survival Analysis Model

1. Mathematical Foundations and Model Definition

2. Network Architecture and Parameterization

3. Training Objective, Regularization, and Feature Selection

4. Symbolic Formula Extraction and Interpretability

5. Empirical Evaluation and Benchmarking

6. Discovery of Nonlinear Interactions and Biological Plausibility

7. Practical Implementation and Usage Workflow

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

CoxKAN Survival Analysis Model

1. Mathematical Foundations and Model Definition

2. Network Architecture and Parameterization

3. Training Objective, Regularization, and Feature Selection

4. Symbolic Formula Extraction and Interpretability

5. Empirical Evaluation and Benchmarking

6. Discovery of Nonlinear Interactions and Biological Plausibility

7. Practical Implementation and Usage Workflow

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research