Papers
Topics
Authors
Recent
Search
2000 character limit reached

Attributed SBMs

Updated 13 May 2026
  • Attributed SBMs are probabilistic models that jointly analyze network connections and multivariate node attributes to uncover latent community structures.
  • They extend classical SBMs by integrating continuous attribute data, enabling precise community detection, link prediction, and attribute recovery.
  • The models employ inference methods such as expectation–maximization, belief propagation, and approximate message passing to navigate phase transitions and enhance performance.

An attributed stochastic block model (SBM) is a probabilistic framework for modeling networks where each node is characterized not only by relational data (edges) but also by a multivariate attribute vector. These models extend the classical SBM—which only explains network connectivity in terms of latent community structure—by integrating node features, typically continuous or high-dimensional, into the generative process. Attributed SBMs aim to more accurately capture real-world network heterogeneity and enable enhanced inference for tasks such as community detection, link prediction, and attribute imputation. Multiple architectures for attributed SBMs have been developed, including generative models that treat attributes as conditional on community assignment, and neural-prior models in which community assignment is itself a function of node attributes (Stanley et al., 2018, Duranthon et al., 2023, Duranthon et al., 2024).

1. Model Variations for Attributed Stochastic Block Models

1.1 Classical Attributed SBM (Gaussian Mixture Augmentation)

In the formulation of "Stochastic Block Models with Multiple Continuous Attributes," adjacency A∈{0,1}N×NA \in \{0,1\}^{N \times N} is modeled jointly with an attribute matrix X∈RN×pX \in \mathbb{R}^{N \times p}, with latent community assignments ZZ:

  • Connectivity:

P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})

where B∈[0,1]K×KB \in [0,1]^{K \times K} is the block connectivity matrix.

  • Attributes:

p(xi∣zi=c)=N(xi∣μc,Σc)p(x_i \mid z_i = c) = \mathcal{N}(x_i \mid \mu_c, \Sigma_c)

Each community cc is parametrized by a mean vector μc\mu_c and covariance Σc\Sigma_c in the attribute space.

1.2 Neural-prior SBM (Generalized Linear Model Prior)

In the "Neural-prior SBM," the causal direction is reversed:

  • Node attributes Fμ∈RMF_\mu \in \mathbb{R}^{M} are drawn i.i.d. from a standard normal.
  • Latent community labels are determined as:

X∈RN×pX \in \mathbb{R}^{N \times p}0

where X∈RN×pX \in \mathbb{R}^{N \times p}1 is a latent weight vector drawn from a prior (Gaussian or Rademacher).

  • Edge formation:

X∈RN×pX \in \mathbb{R}^{N \times p}2

A contextual SBM (CSBM) assumes attributes are drawn conditionally on latent community assignments. Here, X∈RN×pX \in \mathbb{R}^{N \times p}3 are labels; attributes are generated as signal plus noise along an unknown (community) direction X∈RN×pX \in \mathbb{R}^{N \times p}4:

X∈RN×pX \in \mathbb{R}^{N \times p}5

Edges are generated as in the classical SBM (Duranthon et al., 2024).

2. Likelihood Formulation and Inference Schemes

The joint likelihood of connectivity, attributes, and assignments for the Gaussian mixture model is:

X∈RN×pX \in \mathbb{R}^{N \times p}6

The log-likelihood decomposes into connectivity and attribute terms. Parameter estimation is performed with expectation–maximization (EM):

  • E-Step: Compute the posterior responsibilities

X∈RN×pX \in \mathbb{R}^{N \times p}7

  • M-Step: Update X∈RN×pX \in \mathbb{R}^{N \times p}8, X∈RN×pX \in \mathbb{R}^{N \times p}9, ZZ0, ZZ1 using expected sufficient statistics under ZZ2 (Stanley et al., 2018).

For the neural-prior SBM, inference is performed via a combination of belief propagation (on the SBM factor graph) and approximate message passing (AMP) for the neural/GLM part, enabling efficient estimation in the high-dimensional regime (Duranthon et al., 2023).

3. Information-theoretic and Algorithmic Phase Transitions

Analysis of detectability and recovery thresholds is central to understanding the fundamental limits of attributed SBM inference:

  • Detectability threshold: For the neural-prior SBM, partial recovery of communities becomes possible only if

ZZ3

where ZZ4 parameterizes in-community vs out-community edge probabilities, and ZZ5 is the ratio of nodes to feature dimension (Duranthon et al., 2023, Duranthon et al., 2024).

  • Hard Phase: With binary (Rademacher) priors on ZZ6, exact recovery is information-theoretically possible at a lower threshold than polynomial-time algorithms can achieve, with an algorithmically hard region between the information-theoretic and algorithmic thresholds.
  • Phase transitions in contextual/GLM SBMs: Both contextual and neural-prior SBMs manifest sharp transitions depending on "total SNR," with a critical value marking the transition from an uninformative regime to successful detection. The SNR expressions are:
    • ZZ7
    • ZZ8
  • As soon as any nonzero fraction of labels is observed (semi-supervised regime), these phase transitions vanish, and nontrivial inference becomes possible for all parameters (Duranthon et al., 2024).

4. Prediction, Generalization, and Benchmarks

Jointly modeling edges and attributes with attributed SBMs yields improved prediction and imputation capabilities:

  • Link prediction: The attributed SBM enables edge prediction for node pairs, using either attribute-based assignment or posterior community prediction. For example, in biological networks, attributed SBM exhibited higher AUCs (0.71 in a microbiome graph, vs. 0.69 for Jaccard/Adamic–Adar baselines) (Stanley et al., 2018).
  • Collaborative filtering: Given only network structure, community assignments are used to impute node attributes as the mean vector of the assigned community. Attributed SBM achieves lower relative L2 errors than neighborhood-based methods in empirical settings.
  • Generalization error for GCNs: The asymptotic generalization error of single-layer graph convolutional networks trained on attributed SBM data can be computed in the high-dimensional limit. GCNs are proven to be consistent but their error decay constant ZZ9 is strictly less than the Bayes-optimal rate P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})0, even as the SNR increases:

P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})1

The consistency is universal across convex loss functions, but the suboptimal exponent persists even with infinite attribute SNR or in the P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})2 limit (Duranthon et al., 2024).

Model/Architecture Detectability Threshold Max Generalization Rate P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})3
Attributed SBM (Gaussian) Yes, shifted by attribute information N/A (not GCN)
Neural-prior SBM P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})4 P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})5 for GCN; optimal = 1
Contextual SBM (CSBM) P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})6 P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})7 for GCN; optimal = 1

5. Empirical Evaluation and Domain Impact

Extensive synthetic and real-data experiments support the efficacy of attributed SBMs:

  • Community detection: On synthetic data with P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})8, attributed SBMs achieve normalized mutual information (NMI) values of P(Aij=1∣zi=c,zj=d,B)=Bernoulli(Bcd)\mathbb{P}(A_{ij} = 1 \mid z_i = c, z_j = d, B) = \mathrm{Bernoulli}(B_{cd})9, outperforming classical SBM (B∈[0,1]K×KB \in [0,1]^{K \times K}0) and B∈[0,1]K×KB \in [0,1]^{K \times K}1-means on features alone (B∈[0,1]K×KB \in [0,1]^{K \times K}2). The addition of multivariate attributes shifts and smooths the phase transition of community detectability.
  • Bioinformatics applications: In microbiome similarity networks (N=121) and protein-interaction graphs, attributed SBM outperforms classical heuristics on link prediction and attribute recovery (Stanley et al., 2018).
  • Benchmarks for GNNs: The neural-prior SBM provides a tractable, analytically soluble benchmark for evaluating the fundamental limitations of graph neural network architectures in semi-supervised settings (Duranthon et al., 2023, Duranthon et al., 2024).

6. Algorithmic and Theoretical Extensions

Potential model extensions include:

  • Non-Gaussian attribute models: Attributed SBMs may be generalized to use arbitrary attribute distributions and conditional generative models, including deep neural architectures as in the neural-prior framework.
  • Multi-class extensions: The binary classification setup of neural-prior SBMs extends to B∈[0,1]K×KB \in [0,1]^{K \times K}3 by replacing the sign-GLM with multi-class neural mechanisms and updating AMP equations accordingly (Duranthon et al., 2023).
  • Open problems: A rigorous proof of the asymptotic optimality of belief propagation and AMP for neural-prior and attributed SBM inference remains outstanding.

Theoretical insights from attributed SBM research have informed the phase-diagram understanding of community detectability, limitations of polynomial-time inference, and the design of new graph representation learning algorithms that aspire to approach the limits dictated by probabilistic generative models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Attributed SBMs.