2000 character limit reached
Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis (1911.12426v7)
Published 27 Nov 2019 in cs.LG, stat.ME, and stat.ML
Abstract: We analyze large, multi-dimensional, sparse counting data sets, finding unsupervised groups to provide unique insights into genetic data. We create gene and biological pathway groups based on patients' variants to find common risk factors for four common types of cancer (breast, lung, prostate, and colorectal) and autism spectrum disorder. To accomplish this, we extend latent Dirichlet allocation to multiple dimensions and design distinct methods for hierarchical topic modeling. We find that our conditional hierarchical Bayesian Tucker decomposition models are more coherent than baseline models.