Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 27 tok/s
GPT-5 High 22 tok/s Pro
GPT-4o 89 tok/s
GPT OSS 120B 457 tok/s Pro
Kimi K2 169 tok/s Pro
2000 character limit reached

Experimental Comparison of Ensemble Methods and Time-to-Event Analysis Models Through Integrated Brier Score and Concordance Index (2403.07460v1)

Published 12 Mar 2024 in cs.LG

Abstract: Time-to-event analysis is a branch of statistics that has increased in popularity during the last decades due to its many application fields, such as predictive maintenance, customer churn prediction and population lifetime estimation. In this paper, we review and compare the performance of several prediction models for time-to-event analysis. These consist of semi-parametric and parametric statistical models, in addition to machine learning approaches. Our study is carried out on three datasets and evaluated in two different scores (the integrated Brier score and concordance index). Moreover, we show how ensemble methods, which surprisingly have not yet been much studied in time-to-event analysis, can improve the prediction accuracy and enhance the robustness of the prediction performance. We conclude the analysis with a simulation experiment in which we evaluate the factors influencing the performance ranking of the methods using both scores.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Odd O Aalen. A linear regression model for the analysis of life times. Statistics in Medicine, 8(8):907–925, 1989.
  2. Aalen’s additive regression model. Encyclopedia of Biostatistics, 1, 2005.
  3. Tabnet: Attentive interpretable tabular learning. arXiv, 2020.
  4. Crps learning. Journal of Econometrics, 2021.
  5. Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics, 9(1):1–10, 2008.
  6. The logrank test. British Medical Journal (BMJ), 328(7447):1073, 2004.
  7. Leo Breiman. Pasting bites together for prediction in large data sets and on-line. Univ. of Calif., Berkeley, Dept. of Statistics Technical Report, 1997.
  8. Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
  9. Glenn W Brier et al. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1):1–3, 1950.
  10. Simulating survival data using the simsurv r package. Journal of Statistical Software, 97:1–27, 2021.
  11. David R Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972.
  12. Maximum utilization of the life table method in analyzing survival. Journal of Chronic Diseases, 8(6):699–712, 1958.
  13. Cameron Davidson-Pilon. Lifelines: survival analysis in Python. Journal of Open Source Software, 4(40):1317, 2019.
  14. Thomas G Dietterich. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems, pages 1–15. Springer, 2000.
  15. Stephane Fotso et al. PySurvival: Open source package for survival analysis modeling, 2019–. URL https://www.pysurvival.io/.
  16. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189–1232, 2001.
  17. Bayesian network classifiers. Machine Learning, 29(2):131–163, 1997.
  18. Consistent estimation of the expected brier score in general survival models with right-censored event times. Biometrical Journal, 48(6):1029–1040, 2006.
  19. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, 18(17-18):2529–2545, 1999.
  20. O. Grisel and V. Maladiere. Survival analysis benchmark. https://github.com/soda-inria/survival-analysis-benchmark/blob/main/notebooks/truck_dataset.ipynb, 2023.
  21. Effective ways to build and evaluate individual survival distributions. J. Mach. Learn. Res., 21(85):1–63, 2020.
  22. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10):993–1001, 1990.
  23. Simulating duration data for the cox model. Political Science Research and Methods, 7(4):921–928, 2019.
  24. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine, 15(4):361–387, 1996.
  25. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, volume 2. Springer, Berlin, Germany, 2009.
  26. IBM. Kaggle telco customer churn:. IBM Cognos Analytics, 2008.
  27. Random survival forests. The Annals of Applied Statistics, 2(3):841–860, 2008.
  28. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282):457–481, 1958.
  29. Deepsurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1):1–12, 2018.
  30. Support vector regression for censored data (SVRc): a novel tool for survival analysis. In IEEE International Conference on Data Mining, pages 863–868. IEEE, 2008.
  31. An application of the Cox proportional hazards model to bank failure. Journal of Banking & Finance, 10(4):511–531, 1986.
  32. Estimation of prediction error for survival models. Statistics in Medicine, 29(2):262–274, 2010.
  33. The Cox proportional hazards model with change point: An epidemiologic application. Biometrics, pages 783–793, 1990.
  34. Sebastian Pölsterl. scikit-survival: A library for time-to-event analysis built on top of scikit-learn. J. Mach. Learn. Res., 21(212):1–6, 2020.
  35. Fast training of support vector machines for survival analysis. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 243–259. Springer, 2015.
  36. Greg Ridgeway. The state of boosting. Computing science and statistics, pages 172–181, 1999.
  37. Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
  38. A bootstrap resampling procedure for model building: application to the Cox regression model. Statistics in Medicine, 11(16):2093–2109, 1992.
  39. Robert E Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.
  40. Randomized 2×2222\times 22 × 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. Journal of Clinical Oncology, 12(10):2086–2093, 1994. URL https://www.pysurvival.io/.
  41. On ranking in survival analysis: Bounds on the concordance index. In Advances in Neural Information Processing Systems, pages 1209–1216, 2008.
  42. G. Therneau, T. Grambsch. Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.
  43. James Tobin. Estimation of relationships for limited dependent variables. Econometrica: Journal of the Econometric Society, pages 24–36, 1958.
  44. On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in medicine, 30(10):1105–1117, 2011.
  45. Survival prediction using gene expression data: a review and comparison. Computational Statistics & Data Analysis, 53(5):1590–1603, 2009.
  46. Machine learning for survival analysis: A survey. ACM Computing Surveys (CSUR), 51(6):1–36, 2019.
  47. Waloddi Weibull. A statistical theory of strength of materials. IVB-Handl., 1939.
  48. Ensemble Machine Learning: Methods and Applications. Springer, Berlin, Germany, 2012.
  49. Zhi-Hua Zhou. Ensemble Methods: Foundations and Algorithms. Chapman and Hall/CRC, London, UK, 2019.
  50. Determinants of house prices: a quantile regression approach. The Journal of Real Estate Finance and Economics, 37(4):317–333, 2008.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube