PyDTS: A Python Package for Discrete-Time Survival (Regularized) Regression with Competing Risks (2204.05731v5)
Abstract: Time-to-event analysis (survival analysis) is used when the response of interest is the time until a pre-specified event occurs. Time-to-event data are sometimes discrete either because time itself is discrete or due to grouping of failure times into intervals or rounding off measurements. In addition, the failure of an individual could be one of several distinct failure types, known as competing risks (events). Most methods and software packages for survival regression analysis assume that time is measured on a continuous scale. It is well-known that naively applying standard continuous-time models with discrete-time data may result in biased estimators of the discrete-time models. The Python package PyDTS, for simulating, estimating and evaluating semi-parametric competing-risks models for discrete-time survival data, is introduced. The package implements a fast procedure that enables including regularized regression methods, such as LASSO and elastic net, among others. A simulation study showcases flexibility and accuracy of the package. The utility of the package is demonstrated by analysing the Medical Information Mart for Intensive Care (MIMIC) - IV dataset for prediction of hospitalization length of stay.
- Paul D. Allison. Discrete-time methods for the analysis of event histories. Sociological Methodology, 13:61–98, 1982. doi: 10.2307/270718.
- A classification tree approach for the modeling of competing risks in discrete time. The Advances in Data Analysis and Classification, 13(4):965–990, 2019. doi: 10.1007/s11634-018-0345-y.
- D. R. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972. doi: 10.1111/j.2517-6161.1972.tb00899.x.
- Cameron Davidson-Pilon. lifelines: Survival analysis in Python. Journal of Open Source Software, 4(40):1317, 2019. doi: 10.21105/joss.01317.
- Bradley Efron. The efficiency of cox’s likelihood function for censored data. 72(359):557–565, 1977. doi: 10.1080/01621459.1977.10480613.
- Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), 2000. doi: 10.1161/01.CIR.101.23.e215.
- Array programming with numpy. Nature, 585(7825):357–362, 2020. doi: 10.1038/s41586-020-2649-2.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2nd edition, 2009. doi: 10.1007/978-0-387-84858-7.
- Ridge regression: Biased estimation for nonorthogonal problem. Technometrics, 42(1):80–86, 1970. doi: 10.2307/1271436.
- MIMIC-IV (version 2.0). PhysioNet, June 2022. doi: 10.13026/7vcr-e114.
- The Statistical Analysis of Failure Time Data. John Wiley & Sons, 2nd edition, 2011. doi: 10.1002/9781118032985.
- Survival Analysis. Springer-Verlag, 2003. doi: 10.1007/b97377.
- On the analysis of discrete time competing risks data. Biometrics, 74(4):1468–1481, 2018. doi: 10.1111/biom.12881.
- Discrete-Time Competing-Risks Regression with or without Penalization. 2023. doi: 10.48550/arXiv.2303.01186.
- pydts - Python Package for Discrete Time Survival Analysis - Documentation, 2022a. URL https://tomer1812.github.io/pydts/.
- pydts - Python Package for Discrete Time Survival Analysis - Github Repository, 2022b. URL https://github.com/tomer1812/pydts.
- Sebastian Pölsterl. scikit-survival: A library for time-to-event analysis built on top of scikit-learn. Journal of Machine Learning Research, 21(212):1–6, 2020. URL http://jmlr.org/papers/v21/20-729.html.
- pandas-dev/pandas: pandas. Zenodo, February 2020. doi: 10.5281/zenodo.6053272.
- Competing risks analysis for discrete time-to-event data. WIREs Computational Statistics, 13(5):e1529, 2021. doi: 10.1002/wics.1529.
- statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference, 2010.
- Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000. doi: 10.1007/978-1-4757-3294-8.
- Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996. URL https://www.jstor.org/stable/2346178.
- SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
- discSurv: Discrete Time Survival Analysis, 2022. URL https://CRAN.R-project.org/package=discSurv. R package version 2.0.0.
- Analysis of hospital readmissions with competing risks. Statistical Methods in Medical Research, 31(11):2189–2200, 2022. doi: 10.1177/09622802221115879.
- Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005. URL https://www.jstor.org/stable/3647580.