Papers
Topics
Authors
Recent
Search
2000 character limit reached

Geometric Hidden Community Model

Updated 31 January 2026
  • GHCM is a probabilistic generative framework that models spatial networks via latent community labels and continuous geometric features, bridging characteristics of RGG and SBM.
  • It employs motif-based clustering and connectivity thresholds to achieve near-optimal community recovery with efficient, nearly-linear algorithms in sparse regimes.
  • The model generalizes traditional approaches by incorporating distance-dependent connection kernels, offering rigorous information-theoretic recovery thresholds and robust empirical performance.

A Geometric Hidden Community Model (GHCM) is a probabilistic generative framework for spatially embedded networks in which community structure manifests through both discrete latent group labels and continuous geometric features. GHCM is formulated to model networks where edge formation is modulated by latent spatial proximity and community membership, generalizing classical random geometric graphs (RGG) and stochastic block models (SBM). Community detection in GHCM leverages both motif-based properties (e.g., triangles) and information-theoretic thresholds to characterize the fundamental limits of recovery in regimes where traditional SBM approaches do not suffice, particularly in the sparse-graph regime where edge dependencies induced by geometry lead to high motif counts and spatial transitivity.

1. Model Formulation

A prototypical GHCM specifies a set of nn nodes partitioned into kk hidden communities {V1,…,Vk}\{V_1,\ldots,V_k\}, with each node ii assigned an i.i.d. latent coordinate Xi∈[0,1]dX_i \in [0,1]^d or embedded in more general spaces such as a dd-dimensional torus or the unit sphere. Edge formation is governed by geometric proximity modulated by community labels: for a pair (i,j)(i,j), an edge exists with probability fc1,c2(∥Xi−Xj∥)f_{c_1,c_2}(\|X_i-X_j\|), where c1,c2c_1,c_2 are the (hidden) communities of i,ji,j and kk0 is a connection kernel depending on both label pair and distance. A canonical instantiation uses the step kernel:

2. Connectivity Thresholds in Geometric and Block Structures

Community detectability and recovery in GHCMs are intimately linked to the connectivity properties of random geometric and annulus graphs. For RAG{V1,…,Vk}\{V_1,\ldots,V_k\}0 on the {V1,…,Vk}\{V_1,\ldots,V_k\}1-sphere, edges are present if {V1,…,Vk}\{V_1,\ldots,V_k\}2. The transition from disconnected to connected graphs (percolation) underpins the possibility of global community recovery. In {V1,…,Vk}\{V_1,\ldots,V_k\}3 (circle), the critical regime is {V1,…,Vk}\{V_1,\ldots,V_k\}4 and {V1,…,Vk}\{V_1,\ldots,V_k\}5. The graph is connected w.h.p. if {V1,…,Vk}\{V_1,\ldots,V_k\}6 and {V1,…,Vk}\{V_1,\ldots,V_k\}7; otherwise, it is disconnected w.h.p. (Galhotra et al., 2022). In higher dimensions, the isolation and connectivity thresholds are governed by the function

{V1,…,Vk}\{V_1,\ldots,V_k\}8

and radii scaling as {V1,…,Vk}\{V_1,\ldots,V_k\}9; connectivity typically requires ii0 and ii1 (Galhotra et al., 2022).

3. Information-Theoretic and Algorithmic Recovery Thresholds

GHCM admits rigorous information-theoretic sharp thresholds for exact recovery, formulated by evaluating whether sufficient information exists to break the (global relabeling) symmetry in the presence of edge sparsity and geometric correlations. In the classical step-kernel GHCM for ii2, exact recovery is impossible if ii3 or ii4 (Galhotra et al., 2022, Galhotra et al., 2017), with analogous results holding in higher dimensions under corresponding parameter scalings.

For the general distance-dependent pairwise observation setup, the sharp recovery threshold is

ii5

where ii6 is the Poisson process intensity, ii7 is the unit-ball volume in ii8, and ii9 is the Chernoff--Hellinger divergence integrated over spatial distance and label mixture (Gaudio et al., 22 Jan 2025, Gaudio et al., 24 Jan 2026). Above threshold, there exist linear-time or polynomial-time algorithms achieving exact recovery; below, no estimator surpasses chance. The precise formula for Xi∈[0,1]dX_i \in [0,1]^d0 is

Xi∈[0,1]dX_i \in [0,1]^d1

with Xi∈[0,1]dX_i \in [0,1]^d2 the conditional densities and Xi∈[0,1]dX_i \in [0,1]^d3 the distance density (Gaudio et al., 24 Jan 2026).

4. Recovery Algorithms and Computational Aspects

GHCMs admit provably close-to-optimal recovery algorithms in the sparse regime, harnessing geometric transitivity and motif abundance:

  • Triangle-based clustering: Algorithms count triangles for edge Xi∈[0,1]dX_i \in [0,1]^d4—the number of common neighbors—and prune edges not exceeding statistically determined thresholds. Edges with triangle counts close to within-community expectations are retained. The final partition is extracted via connected components or union-find machinery. This scheme succeeds for Xi∈[0,1]dX_i \in [0,1]^d5 (or similar thresholds in higher dimensions), achieving Xi∈[0,1]dX_i \in [0,1]^d6 complexity for Xi∈[0,1]dX_i \in [0,1]^d7 nodes in the sparse regime (Galhotra et al., 2022, Galhotra et al., 2017).
  • Two-phase linear-time algorithms: Recent work (Gaudio et al., 22 Jan 2025, Gaudio et al., 24 Jan 2026) describes "seed-propagate-refine" meta-algorithms: (1) Local MAP inference on small initial blocks, (2) label propagation across spatial blocks via likelihood ratios or motif aggregation, and (3) an exact labeling refinement phase using local MAP with the now-almost-correct labeling. All edges are only examined Xi∈[0,1]dX_i \in [0,1]^d8 times, yielding overall Xi∈[0,1]dX_i \in [0,1]^d9 running time.
  • Spectral methods: In hybrid models (SBM plus geometric noise), standard spectral clustering on the adjacency matrix is robust if the SBM eigen-gap exceeds geometric "noise" by sufficient factor. Explicit eigenvalue separation and Davis-Kahan-type arguments guarantee that the second (or dd0 leading) eigenvectors retain significant alignment with the true community structure when dd1 where dd2 (Peche et al., 2020).
  • Active learning and label queries: Motif-based edge pruning can be combined with querying a vanishing (sublinear) number of node labels. For regimes where indirect motif separation is insufficient, adaptively querying the labels of a few strategically chosen nodes (e.g., one per connected component of the pruned graph) suffices for exact recovery in dd3 queries (Chien et al., 2019).

5. Comparison to SBM and Other Random Graph Models

SBM and GHCM differ fundamentally in edge independence and motif structure:

  • SBM edges are independent given labels; thus, triangle density in sparse-SBM is dd4 per edge—triangle counting does not offer useful separation (Galhotra et al., 2022, Galhotra et al., 2017).
  • GHCMs induce correlated edge formation: spatially nearby nodes participate in many triangles, particularly within communities. Motif (triangle)-counting is therefore an effective and nearly optimal community recovery tool for GHCM, but not for SBM in the sparse regime.
  • GHCM unifies and generalizes random geometric graphs, block models, and latent space models; with step kernels, it reduces to RGG or SBM in special cases (Gaudio et al., 24 Jan 2026, Gaudio et al., 22 Jan 2025).

6. Empirical Performance and Benchmarks

Empirical validation on real networks (Political-Blogs, DBLP collaboration graphs, LiveJournal) confirms that motif-based unsupervised GHCM recovery achieves 75–80% labeling accuracy (as measured against ground truth), outperforming spectral clustering and other SBM-inspired techniques (which achieve only 50–65%) (Galhotra et al., 2022, Galhotra et al., 2017). In synthetic datasets, experiments reveal a sharp threshold behavior: below the predicted dd5 gap, algorithms fail; above, recovery is perfect. Running times are near-linear in dd6 or linear in the number of edges, contrasting with the quadratic cost of spectral methods.

7. Extensions, Generalizations, and Open Problems

GHCM research has advanced to:

  • Recovery in arbitrary dimensions, inhomogeneous spatial domains, and with flexible distance-dependent kernels—both theory and methodology allow for more general geometric and weighted graphs (Avrachenkov et al., 2024, Eldan et al., 2020).
  • Incorporating percolation-theoretic arguments: threshold behavior links to continuum percolation and information flow (Kesten–Stigum thresholds) on branching processes with geometry (Eldan et al., 2020).
  • Bypassing "distinctness-of-distributions" assumptions when between-community and within-community observation distributions coincide for some pairs—data-driven block-propagation algorithms still achieve sharp recovery (Gaudio et al., 22 Jan 2025).
  • Active learning in GHCM indicates that sublinear label queries close the gap between unsupervised and information-theoretic recovery thresholds (Chien et al., 2019).
  • Open questions remain for scalability of optimal recovery algorithms to dd7 communities, adapting to highly sparse regimes, and understanding criticality in more general geometries and connection rules.

References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Geometric Hidden Community Model (GHCM).