The paper introduces the Time-Constant KAN Integrated Network (TCKIN), a novel model designed to predict mortality risk in sepsis patients within Intensive Care Units (ICU). The TCKIN model integrates temporal data, constant data, and diagnostic International Classification of Diseases (ICD) codes to enhance the accuracy of sepsis mortality risk predictions. The paper validates the model against the Medical Information Mart for Intensive Care III (MIMIC-III) and Medical Information Mart for Intensive Care IV (MIMIC-IV) datasets, demonstrating superior performance compared to existing machine learning and deep learning methods in terms of accuracy, sensitivity, and specificity.
The central contributions of the paper are:
- Integration of diagnostic ICD codes with the Clinical Classifications Software (CCS) medical ontology using graph networks to leverage coding systems fully.
- Fusion of temporal data, handled via the Gated Recurrent Unit with Decay (GRU-D), with constant data, analyzed by the KolmogorovâArnold Networks (KAN), to improve predictive capabilities.
- Demonstration of superior performance in prediction accuracy and robustness through the integration of multiple data sources and processing techniques.
The paper uses the MIMIC-III and MIMIC-IV datasets, which include detailed health records from ICUs at Beth Israel Deaconess Medical Center. The data preprocessing involves selecting sepsis patients based on the Sepsis-3 definition, excluding those under 16 years of age, those with corrupted data, and those with ICU stays shorter than 24 hours. Constant data, including demographic information, and temporal data, encompassing physiological signs and laboratory test results, along with diagnostic ICD coding information were extracted. The ICD codes were converted into CCS code sequences to simplify complex diagnostic information.
The TCKIN model architecture comprises three primary components:
- A GRU-D model processes temporal data to generate hidden representations, which is formulated as:
* is the temporal data * is the temporal data at time * is the time interval * is the number of features
The GRU-D network manages missing values using a mask and time intervals . The decay mechanism adjusts imputation values based on these intervals, as shown in the following equations:
* is the decay factor * is the weight matrix * is the time interval * is the bias term
* is the input gate * is the sigmoid activation function * is the weight matrix * is the input feature vector at time step * is the mask * is the weight matrix * is the decay factor * is the hidden state at time * is the bias term
- An attention mechanism analyzes ICD diagnostic codes and CCS codes to capture complex relationships and semantic information. The similarity between codes is calculated to determine their relative importance, and attention weights are assigned accordingly:
$h_{\text{icd} = \text{ScaleDotProductAttention}(Q, K, V) = \text{softmax}\left(\frac{Q K^T}{\sqrt{d^k}\right) V$
* is the ICD hidden state * is the query * is the key * is the value * is the dimensionality of the keys
- A KAN network processes constant data to extract high-level features. The KAN network uses learnable B-spline activation functions at the edges, enhancing flexibility and adaptability. Each layer in KAN is expressed as a function matrix:
* is the activation value at node * represents the post-activation value from input node to output node in layer * includes trainable parameters * is the pre-activation value from input node to output node in layer * is the number of nodes in layer
The operational flow within the network involves pre-activation, post-activation, and node activation. The hidden features from the three components are concatenated and processed through a final KAN network to predict sepsis mortality risk.
The model was implemented and trained using TensorFlow, with the Adam optimizer and a learning rate decay strategy. Oversampling techniques were used to address the imbalance between positive and negative samples. Five-fold cross-validation was adopted to validate the model's stability and generalizability. Performance metrics included sensitivity, specificity, area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and the Brier Score (BS).
The TCKIN model was compared with seven established baseline models, including Xgboost, SVM, Random Forest, LGBM, LSTM, IseeU, and GRU-D. On the MIMIC-IV dataset, the TCKIN model achieved AUROC and AUPRC values of 0.8807 and 0.5470, respectively. The paper also conducted parameter sensitivity experiments focusing on the learning rate and batch size. Ablation experiments were performed by removing or replacing key modules to assess their impact on overall model performance. Replacing GRU-D with a standard GRU module led to a decline in AUROC value from 0.8807 to 0.8547, and replacing the KAN modules with multilayer perceptrons (MLP) led to a reduction in AUROC value from 0.8807 to 0.8693.
The paper identifies key temporal features, including pH value, alanine aminotransferase, red blood cell count, and monocyte count, and constant features, including age, race, weight, and type of admission, as significant influencers of the model’s predictive accuracy. Certain ICD codes related to severe conditions like diabetes and malignancies are also strongly associated with increased mortality risk.
The paper notes limitations, including the datasets originating from a single medical center and opportunities for enhancement in processing specific types of patient data. Future research should consider including a broader array of features, such as genetic markers and imaging data.