Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee (2206.10477v5)

Published 21 Jun 2022 in cs.LG and stat.ML

Abstract: Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. On robustness of principal component regression. Journal of the American Statistical Association, 116(536):1731–1745, 2021.
  2. Optuna: A next-generation hyperparameter optimization framework. In ACM SigKDD International Conference on Knowledge Discovery and Data Mining, 2019.
  3. Optimal data-dependent hashing for approximate near neighbors. In Symposium on Theory of Computing, 2015.
  4. Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems, 2015.
  5. A time-dependent discrimination index for survival data. Statistics in Medicine, 24:3927–3944, 2005.
  6. Age and comorbidity as independent prognostic factors in the treatment of non–small-cell lung cancer: A review of national cancer institute of canada clinical trials group trials. Journal of Clinical Oncology, 26(1):54–59, 2008.
  7. Generalized random forests. The Annals of Statistics, 47(2):1148–1178, 2019.
  8. Clustering on the unit hypersphere using von mises-fisher distributions. Journal of Machine Learning Research, 6(46):1345–1382, 2005.
  9. Stability of k-means clustering. In International Conference on Computational Learning Theory, 2007.
  10. Rudolf Beran. Nonparametric regression with randomly censored survival data. Technical report, University of California, Berkeley, 1981.
  11. Gérard Biau. Analysis of a random forests model. Journal of Machine Learning Research, 13(38):1063–1095, 2012.
  12. Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan):993–1022, 2003.
  13. Ingwer Borg and Patrick J. F. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer Science & Business Media, 2005.
  14. Leo Breiman. Some infinity theory for predictor ensembles. Technical report 577, Statistics Department, University of California, Berkeley, 2000.
  15. Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching. Proceedings of the National Academy of Sciences, 103(5):1168–1172, 2006.
  16. Charles C. Brown. On the use of indicator variables for studying the time-dependence of parameters in a response-time model. Biometrics, 31(4):863–872, 1975.
  17. Adaptive and minimax estimation of the cumulative distribution function given a functional covariate. Electronic Journal of Statistics, 8(2):2352–2404, 2014.
  18. Adversarial time-to-event modeling. In International Conference on Machine Learning, 2018.
  19. Survival cluster analysis. In Conference on Health, Inference, and Learning, 2020.
  20. This looks like that: deep learning for interpretable image recognition. In Advances in Neural Information Processing Systems, 2019.
  21. George H. Chen. Nearest neighbor and kernel survival analysis: Nonasymptotic error bounds and strong consistency rates. In International Conference on Machine Learning, 2019.
  22. George H. Chen. Deep kernel survival analysis and subject-specific survival time prediction intervals. In Machine Learning for Healthcare Conference, 2020.
  23. Explaining the success of nearest neighbor methods in prediction. Foundations and Trends® in Machine Learning, 10(5-6):337–588, 2018.
  24. XGBoost: A scalable tree boosting system. In ACM SigKDD International Conference on Knowledge Discovery and Data Mining, 2016.
  25. David R. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B, 34(2):187–220, 1972.
  26. Dorota M. Dabrowska. Uniform consistency of the kernel conditional Kaplan-Meier estimate. The Annals of Statistics, 17(3):1157–1167, 1989.
  27. Derivative-based neural modelling of cumulative distribution functions for survival analysis. In International Conference on Artificial Intelligence and Statistics, 2022.
  28. Global versus local methods in nonlinear dimensionality reduction. In Advances in Neural Information Processing Systems, 2002.
  29. Narrowing the gap: Random forests in theory and in practice. In International Conference on Machine Learning, 2014.
  30. Anne I. Dipchand. Current state of pediatric cardiac transplantation. Annals of Cardiothoracic Surgery, 7(1):31–55, 2018.
  31. Neural conditional event time models. In Machine Learning for Healthcare Conference, 2020.
  32. A density-based algorithm for discovering clusters in large spatial databases with noise. In International Conference on Knowledge Discovery and Data Mining, 1996.
  33. The urokinase system of plasminogen activation and prognosis in 2780 breast cancer patients. Cancer Research, 60(3):636–643, 2000.
  34. Stephane Fotso. Deep neural networks for survival analysis based on a multi-task framework. arXiv preprint arXiv:1801.05512, 2018.
  35. How much can k-means be improved by using better initialization and repeats? Pattern Recognition, 93:95–112, 2019.
  36. Co-morbidity is a strong predictor of early death and multi-organ system failure among patients with acute pancreatitis. Journal of Gastrointestinal Surgery, 11(6):733–742, 2007.
  37. A scalable discrete-time survival model for neural networks. PeerJ, 7:e6257, 2019.
  38. RNN-SURV: A deep recurrent model for survival analysis. In International Conference on Artificial Neural Networks, 2018.
  39. X-CAL: Explicit calibration for survival analysis. In Advances in Neural Information Processing Systems, 2020.
  40. Effective ways to build and evaluate individual survival distributions. Journal of Machine Learning Research, 21(85):1–63, 2020.
  41. Universal Bayes consistency in metric spaces. The Annals of Statistics, 49(4):2129–2150, 2021.
  42. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer, 2009.
  43. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In IEEE International Conference on Computer Vision, 2015.
  44. The Statistical Analysis of Failure Time Data. Wiley, 1980.
  45. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282):457–481, 1958.
  46. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(24), 2018.
  47. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  48. The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine, 122(3):191–203, 1995.
  49. Nearest-neighbor sample compression: Efficiency, consistency, infinite dimensions. In Advances in Neural Information Processing Systems, 2017.
  50. Time-accuracy tradeoffs in kernel prediction: controlling prediction quality. Journal of Machine Learning Research, 18(44):1–29, 2017.
  51. Clustering with spectral norm and the k-means algorithm. In IEEE Symposium on Foundations of Computer Science, 2010.
  52. Continuous and discrete-time survival prediction with neural networks. Lifetime Data Analysis, 27:710–736, 2021.
  53. Time-to-event prediction with neural networks and Cox regression. Journal of Machine Learning Research, 20(129):1–30, 2019.
  54. Contrastive representation learning: A framework and review. IEEE Access, 8:193907–193934, 2020.
  55. DeepHit: A deep learning approach to survival analysis with competing risks. In AAAI Conference on Artificial Intelligence, 2018.
  56. Neural topic models with survival supervision: Jointly predicting time-to-event outcomes and learning how clinical features relate. In International Conference on Artificial Intelligence in Medicine, 2020.
  57. Learning with hyperspherical uniformity. In International Conference on Artificial Intelligence and Statistics, 2021.
  58. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2020.
  59. A deep variational approach to clustering survival data. In International Conference on Learning Representations, 2022.
  60. Deep Cox mixtures for survival regression. In Machine Learning for Healthcare Conference, 2021.
  61. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, 2019.
  62. Graph-based nearest neighbor search: From practice to theory. In International Conference on Machine Learning, 2020.
  63. Stability of k-means clustering. In Advances in Neural Information Processing Systems, 2006.
  64. Deep survival analysis. In Machine Learning for Healthcare Conference, 2016.
  65. The effect of body mass index on survival following heart transplantation: do outcomes support consensus guidelines? Annals of Surgery, 251(1):144–152, 2010.
  66. Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. german breast cancer study group. Journal of Clinical Oncology, 12(10):2086–2093, 1994.
  67. The relationship between tumour size, nodal status and distant metastases: on the origins of breast cancer. Breast Cancer Research and Treatment, 170(3):647–656, 2018.
  68. SODEN: A scalable continuous-time survival model through ordinary differential equation networks. Journal of Machine Learning Research, 23(34):1–29, 2022.
  69. Impact of age and comorbidity on survival in colorectal cancer. Journal of Gastrointestinal Oncology, 6(6):605–612, 2015.
  70. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018.
  71. Ulrike von Luxburg. Clustering stability: An overview. Foundations and Trends® in Machine Learning, 2(3):235–274, 2010.
  72. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.
  73. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, 2020.
  74. Uncertainty-aware time-to-event prediction using deep kernel accelerated failure time models. In Machine Learning for Healthcare Conference, 2021.
  75. Personalized survival predictions via trees of predictors: An application to cardiac transplantation. PloS one, 13(3):e0194985, 2018.
  76. Topic modelling meets deep neural networks: A survey. In International Joint Conference on Artificial Intelligence, 2021.
  77. Deep extended hazard models for survival analysis. Advances in Neural Information Processing Systems, 2021.
  78. Deep learning for the partially linear Cox model. The Annals of Statistics, 50(3):1348–1375, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com