Relative Survival Analysis Using Bayesian Decision Tree Ensembles (2411.01435v1)
Abstract: In cancer epidemiology, the \emph{relative survival framework} is used to quantify the hazard associated with cancer by comparing the all-cause mortality hazard in cancer patients to that of the general population. This framework assumes that an individual's hazard function is the sum of a known population hazard and an excess hazard associated with the cancer. Several estimands are derived from the excess hazard, including the \emph{net survival}, which are used to inform decisions and to assess the effectiveness of interventions on cancer management. In this paper, we introduce a Bayesian machine learning approach to estimating the excess hazard and identifying vulnerable subgroups, with a higher excess risk, using Bayesian additive regression trees (BART). We first develop a proportional hazards extension of the BART model to the relative survival setting, and then extend this model to non-proportional hazards. We develop tools for model interpretation and posterior summarization and then present an application using colon cancer data from England, highlighting the insights our proposed methodology offers when paired with state-of-the-art data linkage methods. This application demonstrates how these methods can be used to identify drivers of inequalities in cancer survival through variable importance quantification.
- Aalen, O. (1978). Nonparametric inference for a family of counting processes. The Annals of Statistics, 6(4):701–726.
- Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422):669–679.
- Semiparametric analysis of clustered interval-censored survival data using soft bayesian additive regression trees (SBART). Biometrics, 78(3):880–893.
- Deriving stage at diagnosis from multiple population-based sources: colorectal and lung cancer in England. British Journal of Cancer, 115(3):391–400.
- mexhaz: an R package for fitting flexible hazard-based regression models for overall and excess mortality with a random effect. Journal of Statistical Software, 98:1–36.
- Bayesian CART model search. Journal of the American Statistical Association, 93(443):935–948.
- BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1):266–298.
- Regression models for relative survival. Statistics in Medicine, 23(1):51–64.
- Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. Statistical Science, 34(1):43–68.
- Ederer, F. (1961). The relative survival rate: a statistical methodology. National Cancer Institute Monograph, 6:101–121.
- A unifying framework for flexible excess hazard modelling with applications in cancer epidemiology. Journal of the Royal Statistical Society: Series C (Applied Statistics), 71:1044–1062.
- Routes to diagnosis for cancer–determining the patient journey using multiple routine data sets. British Journal of Cancer, 107(8):1220–1226.
- Multi-dimensional penalized hazard model with continuous covariates: applications for studying trends and social inequalities in cancer survival. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(5):1233–1257.
- Subgroup identification from randomized clinical trial data. Statistics in Medicine, 30(24):2867–2880.
- Development and validation of a hospital frailty risk score focusing on older people in acute care settings using electronic hospital records: an observational study. The Lancet, 391(10132):1775–1782.
- A relative survival regression model using B-spline functions to model non-proportional hazards. Statistics in Medicine, 22(17):2767–2784.
- Bayesian additive regression trees: A review and look forward. Annual Review of Statistics and Its Application, 7:251–278.
- bartMachine: Machine learning with Bayesian additive regression trees. Journal of Statistical Software, 70(4):1–40.
- Additive and multiplicative covariate regression models for relative survival incorporating fractional polynomials for time-dependent effects. Statistics in Medicine, 24(24):3871–3885.
- Linero, A. R. (2018). Bayesian regression trees for high-dimensional prediction and variable selection. Journal of the American Statistical Association, 113(522):626–636.
- Linero, A. R. (2024). Generalized Bayesian additive regression trees models: Beyond conditional conjugacy. Journal of the American Statistical Association, pages 1–14.
- Bayesian Survival Tree Ensembles with Submodel Shrinkage. Bayesian Analysis, 17(3):997–1020.
- Semiparametric mixed-scale models using shared Bayesian forests. Biometrics, 76(1):131–144.
- Bayesian regression tree ensembles that adapt to smoothness and sparsity. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(5):1087–1110.
- Reproducibility, reliability and validity of population-based administrative health data for the assessment of cancer non-related comorbidities. PloS One, 12(3):e0172814.
- Cancer survival: an overview of measures, uses, and interpretation. Journal of the National Cancer Institute Monographs, 2014(49):145–186.
- Impact of comorbidities at diagnosis on the 10-year colorectal cancer net survival: A population-based study. Cancer Epidemiology, 73:101962.
- Murray, J. S. (2021). Log-linear Bayesian additive regression trees for multinomial logistic and count regression models. Journal of the American Statistical Association, 116(534):756–769.
- National Institute for Health and Care Excellence (2020). Colorectal cancer, nice guideline [ng151]. Accessed: 2024-10-08.
- NHS National Cancer Registration and Analysis Service (2023). Cancer survival methodology. Accessed: 2024-09-24.
- On estimation in relative survival. Biometrics, 68(1):113–120.
- Flexible Bayesian excess hazard models using low-rank thin plate splines. Statistical Methods in Medical Research, 29(6):1700–1714.
- An index of cancer survival to measure progress in cancer control: A tutorial. Cancer Epidemiology, 90:102576.
- Bayesian variable selection and survival modeling: assessing the most important comorbidities that impact lung and colorectal cancer survival in Spain. BMC Medical Research Methodology, 22(1):1–14.
- On a general structure for hazard-based regression models: an application to population-based cancer research. Statistical Methods in Medical Research, 28:2404–2417.
- A Bayesian justification of Cox’s partial likelihood. Biometrika, 90(3):629–641.
- Nonparametric survival analysis using Bayesian additive regression trees (BART). Statistics in Medicine, 35(16):2741–2753.
- Practical bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and Computing, 27(5):1413–1432.
- Origins of socio-economic inequalities in cancer survival: a review. Annals of Oncology, 17(1):5–19.
- Model interpretation through lower-dimensional posterior summarization. Journal of Computational and Graphical Statistics, pages 1–9.