Community detection for binary graphical models in high dimension (2411.15627v2)
Abstract: Let $N$ components be partitioned into two communities, denoted ${\cal P}+$ and ${\cal P}-$, possibly of different sizes. Assume that they are connected via a directed and weighted Erd\"os-R\'enyi random graph (DWER) with unknown parameter $ p \in (0, 1).$ The weights assigned to the existing connections are of mean-field type, scaling as $N{-1}$. At each time unit, we observe the state of each component: either it sends some signal to its successors (in the directed graph) or remains silent otherwise. In this paper, we show that it is possible to find the communities ${\cal P}+$ and ${\cal P}-$ based only on the activity of the $N$ components observed over $T$ time units. More specifically, we propose a simple algorithm for which the probability of {\it exact recovery} converges to $1$ as long as $(N/T{1/2})\log(NT) \to 0$, as $T$ and $N$ diverge. Interestingly, this simple algorithm does not require any prior knowledge on the other model parameters (e.g. the edge probability $p$). The key step in our analysis is to derive an asymptotic approximation of the one unit time-lagged covariance matrix associated to the states of the $N$ components, as $N$ diverges. This asymptotic approximation relies on the study of the behavior of the solutions of a matrix equation of Stein type satisfied by the simultaneous (0-lagged) covariance matrix associated to the states of the components. This study is challenging, specially because the simultaneous covariance matrix is random since it depends on the underlying DWER random graph.