Distributed Compression of Graphical Data (1802.07446v3)
Abstract: In contrast to time series, graphical data is data indexed by the vertices and edges of a graph. Modern applications such as the internet, social networks, genomics and proteomics generate graphical data, often at large scale. The large scale argues for the need to compress such data for storage and subsequent processing. Since this data might have several components available in different locations, it is also important to study distributed compression of graphical data. In this paper, we derive a rate region for this problem which is a counterpart of the Slepian-Wolf theorem. We characterize the rate region when the statistical description of the distributed graphical data can be modeled as being one of two types - as a member of a sequence of marked sparse Erdos-Renyi ensembles or as a member of a sequence of marked configuration model ensembles. Our results are in terms of a generalization of the notion of entropy introduced by Bordenave and Caputo in the study of local weak limits of sparse graphs. Furthermore, we give a generalization of this result for Erdos-Renyi and configuration model ensembles with more than two sources.