Penalized Deep Partially Linear Cox Models with Application to CT Scans of Lung Cancer Patients (2303.05341v3)
Abstract: Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective patient-centered therapies. The National Lung Screening Trial (NLST) employed computed tomography texture analysis, which provides objective measurements of texture patterns on CT scans, to quantify the mortality risks of lung cancer patients. Partially linear Cox models have gained popularity for survival analysis by dissecting the hazard function into parametric and nonparametric components, allowing for the effective incorporation of both well-established risk factors (such as age and clinical variables) and emerging risk factors (e.g., image features) within a unified framework. However, when the dimension of parametric components exceeds the sample size, the task of model fitting becomes formidable, while nonparametric modeling grapples with the curse of dimensionality. We propose a novel Penalized Deep Partially Linear Cox Model (Penalized DPLC), which incorporates the SCAD penalty to select important texture features and employs a deep neural network to estimate the nonparametric component of the model. We prove the convergence and asymptotic properties of the estimator and compare it to other methods through extensive simulation studies, evaluating its performance in risk prediction and feature selection. The proposed method is applied to the NLST study dataset to uncover the effects of key clinical and imaging risk factors on patients' survival. Our findings provide valuable insights into the relationship between these factors and survival outcomes.
- Brett C Bade and Charles S Dela Cruz. Lung cancer 2020: epidemiology, etiology, and prevention. Clinics in Chest Medicine, 41(1):1–24, 2020.
- Results of a union-based smoking cessation intervention for apprentice iron workers. Cancer Causes & Control, 17:53–61, 2006.
- National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. New England Journal of Medicine, 365(5):395–409, 2011.
- Ct texture analysis: definitions, applications, biologic correlates, and challenges. Radiographics, 37(5):1483–1503, 2017.
- Radiomics: the bridge between medical imaging and personalized medicine. Nature Reviews Clinical Oncology, 14(12):749–762, 2017.
- David R Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972.
- Jian Huang. Efficient estimation of the partly linear additive Cox model. The Annals of Statistics, 27(5):1536–1563, 1999.
- Deep learning for the partially linear Cox model. The Annals of Statistics, 50(3):1348–1375, 2022.
- Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6):861–867, 1993.
- Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. The Annals of Statistics, 48(4):1875–1897, 2020.
- Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In International conference on artificial intelligence and statistics, pages 4313–4324. PMLR, 2020.
- Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
- Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360, 2001.
- Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics, 30(1):74–99, 2002.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
- Early-stopped neural networks are consistent. Advances in Neural Information Processing Systems, 34:1805–1817, 2021.
- Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. The Annals of Applied Statistics, 5(1):232, 2011.
- Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455, 1994.
- Joel L Horowitz. Semiparametric and nonparametric methods in econometrics, volume 12. Springer, 2009.
- Variable selection in a partially linear proportional hazards model with a diverging dimensionality. Statistics & Probability Letters, 83(1):61–69, 2013.
- Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25(7):890–896, 2009.
- Random survival forests. The Annals of Applied Statistics, 2(3):841–860, 2008.
- Deepsurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1):1–12, 2018.
- Textural features corresponding to textural properties. IEEE Transactions on systems, man, and Cybernetics, 19(5):1264–1274, 1989.
- A generalized laplacian of gaussian filter for blob detection and its applications. IEEE transactions on cybernetics, 43(6):1719–1733, 2013.
- 3d lbp-based rotationally invariant region description. In Asian Conference on Computer Vision, pages 26–37. Springer, 2012.
- Lifetime smoking history and risk of lung cancer: results from the framingham heart study. JNCI: Journal of the National Cancer Institute, 110(11):1201–1207, 2018.
- The obesity paradox in cancer: epidemiologic insights and perspectives. Current Nutrition Reports, 8:175–181, 2019.
- Gender differences in non–small-cell lung cancer survival: an analysis of 4,618 patients diagnosed between 1997 and 2002. The Annals of Thoracic Surgery, 78(1):209–215, 2004.
- Lymphovascular invasion in non–small-cell lung cancer: implications for staging and adjuvant therapy. Journal of Thoracic Oncology, 7(7):1141–1147, 2012.
- Robert Tibshirani. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3):273–282, 2011.
- Counting processes and survival analysis, volume 625. John Wiley & Sons, 2013.
- Martingale limit theory and its application. Academic press, 2014.
- David Pollard. Empirical processes: theory and applications. Ims, 1990.
- Jon A Wellner. Empirical processes: Theory and applications. Notes for a course given at Delft University of Technology, page 17, 2005.
- Yuming Sun (5 papers)
- Jian Kang (142 papers)
- Chinmay Haridas (1 paper)
- Nicholas R. Mayne (1 paper)
- Alexandra L. Potter (1 paper)
- Chi-Fu Jeffrey Yang (2 papers)
- David C. Christiani (6 papers)
- Yi Li (482 papers)