Predicting Survivability of Cancer Patients with Metastatic Patterns Using Explainable AI (2504.06306v1)

Published 7 Apr 2025 in q-bio.QM and cs.AI

Abstract: Cancer remains a leading global health challenge and a major cause of mortality. This study leverages ML to predict the survivability of cancer patients with metastatic patterns using the comprehensive MSK-MET dataset, which includes genomic and clinical data from 25,775 patients across 27 cancer types. We evaluated five ML models-XGBoost, Na\"ive Bayes, Decision Tree, Logistic Regression, and Random Fores using hyperparameter tuning and grid search. XGBoost emerged as the best performer with an area under the curve (AUC) of 0.82. To enhance model interpretability, SHapley Additive exPlanations (SHAP) were applied, revealing key predictors such as metastatic site count, tumor mutation burden, fraction of genome altered, and organ-specific metastases. Further survival analysis using Kaplan-Meier curves, Cox Proportional Hazards models, and XGBoost Survival Analysis identified significant predictors of patient outcomes, offering actionable insights for clinicians. These findings could aid in personalized prognosis and treatment planning, ultimately improving patient care.

Authors (3)

Polycarp Nalela (2 papers)
Deepthi Rao (3 papers)
Praveen Rao (10 papers)

Summary

Predicting Survivability of Cancer Patients with Metastatic Patterns Using Explainable AI

The research paper titled "Predicting Survivability of Cancer Patients with Metastatic Patterns Using Explainable AI" examines the application of ML techniques to predict the survival outcomes of cancer patients based on genomic and clinical data. Utilizing the extensive MSK-MET dataset, which comprises data from 25,775 patients across 27 cancer types, the paper evaluates several ML models including XGBoost, Naïve Bayes, Decision Tree, Logistic Regression, and Random Forest. The results identify XGBoost as the most effective model, achieving an area under the curve (AUC) of 0.82.

Methodology and Findings

The research applies a rigorous methodology comprising data preprocessing, stratified sampling, model selection, and hyperparameter optimization through grid search for enhancing predictive performance. The XGBoost model, in particular, benefits from its efficiency in handling complex, high-dimensional datasets typical in cancer research. Following training, model interpretability is enhanced using SHapley Additive exPlanations (SHAP) to delineate key predictors such as metastatic site count, tumor mutation burden, fraction of genome altered, and organ-specific metastases.

Among the models assessed, XGBoost displayed superior accuracy (0.74) and AUC compared to its counterparts, underscoring its capacity to capture intricate relationships in large biomedical datasets. Ensemble methods slightly improved classifier performance, suggesting that combining algorithms can marginally enhance predictive accuracy, albeit the gains were modest compared to the lone XGBoost model.

Survival Analysis

Beyond model performance, the paper conducts an in-depth survival analysis using Kaplan-Meier curves, Cox Proportional Hazards models, and XGBoost Survival Analysis. This comprehensive assessment shows that metastatic site count, tumor mutation burden, and fraction genome altered are significant predictors of patient outcomes. Survival discrepancies are starkly visible in the Kaplan-Meier curves, demonstrating a considerable gap in survival probabilities between metastatic and non-metastatic patients.

The Cox Proportional Hazards model reaffirms these findings, quantifying the impact of various predictors on survival likelihood, while the XGBoost Survival model further refines these predictions with a concordance index of 0.70, higher than the standard Cox model, thus illustrating its ability to better capture nonlinearities in survival data.

Implications for Research and Practice

In terms of practical implications, the findings facilitate more granular, personalized cancer prognosis and treatment planning, significantly improving clinical decision-making. By integrating expansive genomic datasets with sophisticated ML models, clinicians gain access to more accurate and transparent predictive tools, thereby potentially reducing healthcare costs and enhancing patient care.

On a theoretical level, this paper contributes to the growing body of work on explainable AI in biomedicine, showcasing how ML models can be both powerful and interpretable. This intersection of AI and clinical analytics holds great promise for the development of targeted therapeutic strategies, leveraging the predictive accuracy of ML to address complex clinical challenges.

The paper heralds future avenues in AI-driven cancer research, particularly in expanding the use of explainable models to encompass additional data modalities and refining them with prospective clinical validations. As the field matures, it is expected that the integration of AI in oncology will evolve towards more robust, adaptive systems capable of continuously learning from new data streams to optimize patient outcomes.

Related Papers

Find Related Papers

Tweets

https://twitter.com/XTXI/status/1910225019193479623

YouTube

Show All Videos