Identifying User Survival Types via Clustering of Censored Social Network Data (1703.03401v1)
Abstract: The goal of cluster analysis in survival data is to identify clusters that are decidedly associated with the survival outcome. Previous research has explored this problem primarily in the medical domain with relatively small datasets, but the need for such a clustering methodology could arise in other domains with large datasets, such as social networks. Concretely, we wish to identify different survival classes in a social network by clustering the users based on their lifespan in the network. In this paper, we propose a decision tree based algorithm that uses a global normalization of $p$-values to identify clusters with significantly different survival distributions. We evaluate the clusters from our model with the help of a simple survival prediction task and show that our model outperforms other competing methods.