Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Clustering Survival Data using a Mixture of Non-parametric Experts (2405.15934v1)

Published 24 May 2024 in cs.LG and stat.ML

Abstract: Survival analysis aims to predict the timing of future events across various fields, from medical outcomes to customer churn. However, the integration of clustering into survival analysis, particularly for precision medicine, remains underexplored. This study introduces SurvMixClust, a novel algorithm for survival analysis that integrates clustering with survival function prediction within a unified framework. SurvMixClust learns latent representations for clustering while also predicting individual survival functions using a mixture of non-parametric experts. Our evaluations on five public datasets show that SurvMixClust creates balanced clusters with distinct survival curves, outperforms clustering baselines, and competes with non-clustering survival models in predictive accuracy, as measured by the time-dependent c-index and log-rank metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. The lancet Diabetes & endocrinology, 6(5):361–369, 2018.
  2. A time-dependent discrimination index for survival data. Statistics in medicine, 24(24):3927–3944, 2005.
  3. Stochastic em algorithms for parametric and semiparametric mixture models for right-censored lifetime data. Computational Statistics, 31(4):1513–1538, 2016.
  4. Model-based clustering and classification for data science: with applications in R, volume 50. Cambridge University Press, 2019.
  5. Stochastic versions of the em algorithm: an experimental study in the mixture case. Journal of statistical computation and simulation, 55(4):287–314, 1996.
  6. Survival cluster analysis. In Proceedings of the ACM Conference on Health, Inference, and Learning, pp.  60–68, 2020.
  7. An Algorithm for Creating Prognostic Systems for Cancer. Journal of Medical Systems, 40(7):160, July 2016. ISSN 1573-689X. doi: 10.1007/s10916-016-0518-1.
  8. A new initiative on precision medicine. New England journal of medicine, 372(9):793–795, 2015.
  9. Cox, D. R. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972.
  10. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486(7403):346–352, 2012.
  11. Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution. arXiv:2009.10826 [stat], September 2020. URL http://arxiv.org/abs/2009.10826. arXiv: 2009.10826.
  12. Survivallvq: Interpretable supervised clustering and prediction in survival analysis via learning vector quantization. Pattern Recognition, pp.  110497, 2024.
  13. survPresmooth: An R package for presmoothed estimation in survival analysis. Journal of Statistical Software, 54(11):1–26, 2013. doi: 10.18637/jss.v054.i11.
  14. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22, 1977.
  15. Use of nonclonal serum immunoglobulin free light chains to predict overall survival in the general population. In Mayo Clinic Proceedings, volume 87, pp.  517–523. Elsevier, 2012.
  16. Counting processes and survival analysis. John Wiley & Sons, 2011.
  17. The urokinase system of plasminogen activation and prognosis in 2780 breast cancer patients. Cancer research, 60(3):636–643, 2000.
  18. Evaluating the yield of medical tests. Jama, 247(18):2543–2546, 1982.
  19. Random survival forests. The annals of applied statistics, 2(3):841–860, 2008.
  20. A deep survival analysis method based on ranking. Artificial intelligence in medicine, 98:1–9, 2019.
  21. Nonparametric estimation from incomplete observations. Journal of the American statistical association, 53(282):457–481, 1958.
  22. Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC medical research methodology, 18(1):24, 2018.
  23. Survival analysis, volume 3. Springer, Springer, 2010.
  24. The support prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of internal medicine, 122(3):191–203, 1995.
  25. Continuous and discrete-time survival prediction with neural networks. arXiv preprint arXiv:1910.06724, 2019.
  26. Finite mixture modeling of censored data using the multivariate Student- distribution. Journal of Multivariate Analysis, 159, May 2017. doi: 10.1016/j.jmva.2017.05.005.
  27. Survival trees by goodness of split. Journal of the American Statistical Association, 88(422):457–467, 1993.
  28. Applied survival analysis: regression modeling of time-to-event data. John Wiley & Sons, 2011.
  29. A deep variational approach to clustering survival data. arXiv preprint arXiv:2106.05763, 2021.
  30. Mantel, N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep, 50:163–170, 1966.
  31. Deep lifetime clustering. arXiv preprint arXiv:1910.00547, 2019.
  32. Fast training of support vector machines for survival analysis. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.  243–259. Springer, 2015.
  33. Clustering of Largely Right-Censored Oropharyngeal Head and Neck Cancer Patients for Discriminative Groupings to Improve Outcome Prediction. Scientific Reports, 10(1):3811, December 2020. ISSN 2045-2322. doi: 10.1038/s41598-020-60140-0. URL http://www.nature.com/articles/s41598-020-60140-0.
  34. A weighted random survival forest. Knowledge-based systems, 177:136–144, 2019.
  35. Machine learning for survival analysis: A survey. ACM Computing Surveys (CSUR), 51(6):1–36, 2019a.
  36. Model-based clustering of censored data via mixtures of factor analyzers. Computational Statistics & Data Analysis, 140, June 2019b. doi: 10.1016/j.csda.2019.06.001.
  37. Finite mixture of regression models for censored data based on scale mixtures of normal distributions. Advances in Data Analysis and Classification, 13(1):89–116, March 2019. ISSN 1862-5355. doi: 10.1007/s11634-018-0337-y. URL https://doi.org/10.1007/s11634-018-0337-y.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets