Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee (2206.10477v5)
Abstract: Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets
- On robustness of principal component regression. Journal of the American Statistical Association, 116(536):1731–1745, 2021.
- Optuna: A next-generation hyperparameter optimization framework. In ACM SigKDD International Conference on Knowledge Discovery and Data Mining, 2019.
- Optimal data-dependent hashing for approximate near neighbors. In Symposium on Theory of Computing, 2015.
- Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems, 2015.
- A time-dependent discrimination index for survival data. Statistics in Medicine, 24:3927–3944, 2005.
- Age and comorbidity as independent prognostic factors in the treatment of non–small-cell lung cancer: A review of national cancer institute of canada clinical trials group trials. Journal of Clinical Oncology, 26(1):54–59, 2008.
- Generalized random forests. The Annals of Statistics, 47(2):1148–1178, 2019.
- Clustering on the unit hypersphere using von mises-fisher distributions. Journal of Machine Learning Research, 6(46):1345–1382, 2005.
- Stability of k-means clustering. In International Conference on Computational Learning Theory, 2007.
- Rudolf Beran. Nonparametric regression with randomly censored survival data. Technical report, University of California, Berkeley, 1981.
- Gérard Biau. Analysis of a random forests model. Journal of Machine Learning Research, 13(38):1063–1095, 2012.
- Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan):993–1022, 2003.
- Ingwer Borg and Patrick J. F. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer Science & Business Media, 2005.
- Leo Breiman. Some infinity theory for predictor ensembles. Technical report 577, Statistics Department, University of California, Berkeley, 2000.
- Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching. Proceedings of the National Academy of Sciences, 103(5):1168–1172, 2006.
- Charles C. Brown. On the use of indicator variables for studying the time-dependence of parameters in a response-time model. Biometrics, 31(4):863–872, 1975.
- Adaptive and minimax estimation of the cumulative distribution function given a functional covariate. Electronic Journal of Statistics, 8(2):2352–2404, 2014.
- Adversarial time-to-event modeling. In International Conference on Machine Learning, 2018.
- Survival cluster analysis. In Conference on Health, Inference, and Learning, 2020.
- This looks like that: deep learning for interpretable image recognition. In Advances in Neural Information Processing Systems, 2019.
- George H. Chen. Nearest neighbor and kernel survival analysis: Nonasymptotic error bounds and strong consistency rates. In International Conference on Machine Learning, 2019.
- George H. Chen. Deep kernel survival analysis and subject-specific survival time prediction intervals. In Machine Learning for Healthcare Conference, 2020.
- Explaining the success of nearest neighbor methods in prediction. Foundations and Trends® in Machine Learning, 10(5-6):337–588, 2018.
- XGBoost: A scalable tree boosting system. In ACM SigKDD International Conference on Knowledge Discovery and Data Mining, 2016.
- David R. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B, 34(2):187–220, 1972.
- Dorota M. Dabrowska. Uniform consistency of the kernel conditional Kaplan-Meier estimate. The Annals of Statistics, 17(3):1157–1167, 1989.
- Derivative-based neural modelling of cumulative distribution functions for survival analysis. In International Conference on Artificial Intelligence and Statistics, 2022.
- Global versus local methods in nonlinear dimensionality reduction. In Advances in Neural Information Processing Systems, 2002.
- Narrowing the gap: Random forests in theory and in practice. In International Conference on Machine Learning, 2014.
- Anne I. Dipchand. Current state of pediatric cardiac transplantation. Annals of Cardiothoracic Surgery, 7(1):31–55, 2018.
- Neural conditional event time models. In Machine Learning for Healthcare Conference, 2020.
- A density-based algorithm for discovering clusters in large spatial databases with noise. In International Conference on Knowledge Discovery and Data Mining, 1996.
- The urokinase system of plasminogen activation and prognosis in 2780 breast cancer patients. Cancer Research, 60(3):636–643, 2000.
- Stephane Fotso. Deep neural networks for survival analysis based on a multi-task framework. arXiv preprint arXiv:1801.05512, 2018.
- How much can k-means be improved by using better initialization and repeats? Pattern Recognition, 93:95–112, 2019.
- Co-morbidity is a strong predictor of early death and multi-organ system failure among patients with acute pancreatitis. Journal of Gastrointestinal Surgery, 11(6):733–742, 2007.
- A scalable discrete-time survival model for neural networks. PeerJ, 7:e6257, 2019.
- RNN-SURV: A deep recurrent model for survival analysis. In International Conference on Artificial Neural Networks, 2018.
- X-CAL: Explicit calibration for survival analysis. In Advances in Neural Information Processing Systems, 2020.
- Effective ways to build and evaluate individual survival distributions. Journal of Machine Learning Research, 21(85):1–63, 2020.
- Universal Bayes consistency in metric spaces. The Annals of Statistics, 49(4):2129–2150, 2021.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer, 2009.
- Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In IEEE International Conference on Computer Vision, 2015.
- The Statistical Analysis of Failure Time Data. Wiley, 1980.
- Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282):457–481, 1958.
- DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(24), 2018.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
- The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine, 122(3):191–203, 1995.
- Nearest-neighbor sample compression: Efficiency, consistency, infinite dimensions. In Advances in Neural Information Processing Systems, 2017.
- Time-accuracy tradeoffs in kernel prediction: controlling prediction quality. Journal of Machine Learning Research, 18(44):1–29, 2017.
- Clustering with spectral norm and the k-means algorithm. In IEEE Symposium on Foundations of Computer Science, 2010.
- Continuous and discrete-time survival prediction with neural networks. Lifetime Data Analysis, 27:710–736, 2021.
- Time-to-event prediction with neural networks and Cox regression. Journal of Machine Learning Research, 20(129):1–30, 2019.
- Contrastive representation learning: A framework and review. IEEE Access, 8:193907–193934, 2020.
- DeepHit: A deep learning approach to survival analysis with competing risks. In AAAI Conference on Artificial Intelligence, 2018.
- Neural topic models with survival supervision: Jointly predicting time-to-event outcomes and learning how clinical features relate. In International Conference on Artificial Intelligence in Medicine, 2020.
- Learning with hyperspherical uniformity. In International Conference on Artificial Intelligence and Statistics, 2021.
- Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2020.
- A deep variational approach to clustering survival data. In International Conference on Learning Representations, 2022.
- Deep Cox mixtures for survival regression. In Machine Learning for Healthcare Conference, 2021.
- PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, 2019.
- Graph-based nearest neighbor search: From practice to theory. In International Conference on Machine Learning, 2020.
- Stability of k-means clustering. In Advances in Neural Information Processing Systems, 2006.
- Deep survival analysis. In Machine Learning for Healthcare Conference, 2016.
- The effect of body mass index on survival following heart transplantation: do outcomes support consensus guidelines? Annals of Surgery, 251(1):144–152, 2010.
- Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. german breast cancer study group. Journal of Clinical Oncology, 12(10):2086–2093, 1994.
- The relationship between tumour size, nodal status and distant metastases: on the origins of breast cancer. Breast Cancer Research and Treatment, 170(3):647–656, 2018.
- SODEN: A scalable continuous-time survival model through ordinary differential equation networks. Journal of Machine Learning Research, 23(34):1–29, 2022.
- Impact of age and comorbidity on survival in colorectal cancer. Journal of Gastrointestinal Oncology, 6(6):605–612, 2015.
- Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018.
- Ulrike von Luxburg. Clustering stability: An overview. Foundations and Trends® in Machine Learning, 2(3):235–274, 2010.
- Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.
- Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, 2020.
- Uncertainty-aware time-to-event prediction using deep kernel accelerated failure time models. In Machine Learning for Healthcare Conference, 2021.
- Personalized survival predictions via trees of predictors: An application to cardiac transplantation. PloS one, 13(3):e0194985, 2018.
- Topic modelling meets deep neural networks: A survey. In International Joint Conference on Artificial Intelligence, 2021.
- Deep extended hazard models for survival analysis. Advances in Neural Information Processing Systems, 2021.
- Deep learning for the partially linear Cox model. The Annals of Statistics, 50(3):1348–1375, 2022.