Correlation between Multivariate Datasets, from Inter-Graph Distance computed using Graphical Models Learnt With Uncertainties

Published 31 Oct 2017 in stat.ME and stat.AP | (1710.11292v3)

Abstract: We present a method for simultaneous Bayesian learning of the correlation matrix and graphical model of a multivariate dataset, along with uncertainties in each, to subsequently compute distance between the learnt graphical models of a pair of datasets, using a new metric that approximates an uncertainty-normalised Hellinger distance between the posterior probabilities of the graphical models given the respective dataset; correlation between the pair of datasets is then computed as a corresponding affinity measure. We achieve a closed-form likelihood of the between-columns correlation matrix by marginalising over the between-row matrices. This between-columns correlation is updated first, given the data, and the graph is then updated, given the partial correlation matrix that is computed given the updated correlation, allowing for learning of the 95$\%$ Highest Probability Density credible regions of the correlation matrix and graphical model of the data. Difference made to the learnt graphical model, by acknowledgement of measurement noise, is demonstrated on a small simulated dataset, while the large human disease-symptom network--with $>8,000$ nodes--is learnt using real data. Data on vino-chemical attributes of Portuguese red and white wine samples are employed to learn with-uncertainty graphical model of each dataset, and subsequently, the distance between these learnt graphical models.