Papers
Topics
Authors
Recent
2000 character limit reached

Sparse Signed Message Passing Network

Updated 10 January 2026
  • Sparse Signed Message Passing Network is a probabilistic semi-supervised learning architecture that models latent signed adjacencies for robust node classification on noisy, heterophilic graphs.
  • It employs a GCN encoder and MLP decoder for variational posterior inference, using Monte Carlo sampling and a LASSO-based sparse neighbor selection to capture predictive uncertainty.
  • The framework’s explicit sign-aware message aggregation and structural loss optimization yield consistent performance improvements over traditional GNNs in both noisy and heterophilic benchmarks.

The Sparse Signed Message Passing Network (SpaM, also referred to as SSMPN) is a probabilistic semi-supervised learning architecture introduced for robust node classification on graphs where both edge reliability and homophily assumptions are compromised. SSMPN employs Bayesian structural inference and sparse signed message aggregation, enabling principled robustness to edge noise and label-heterophily by explicitly modeling predictive uncertainty over signed graph structures (Choi et al., 3 Jan 2026).

1. Probabilistic Modeling of Signed Adjacency

The foundational construct of SSMPN is a latent signed adjacency matrix

Z{1,0,+1}n×n,Z \in \{-1,0,+1\}^{n \times n},

encoding, for each ordered node pair (i,j)(i, j):

  • zij=+1z_{ij} = +1 for supporting (homophilic) edges,
  • zij=1z_{ij} = -1 for opposing (heterophilic) edges,
  • zij=0z_{ij} = 0 for absent edges.

Observed adjacency AobsA_{\rm obs} arises from an unknown, noisy channel p(AobsZ)p(A_{\rm obs} \mid Z). SSMPN specifies a factorized prior:

p(Z)=(i,j)Eobsp(zij),p(zij=s)=π0s,p(Z) = \prod_{(i,j) \in \mathcal E_{\rm obs}} p(z_{ij}), \quad p(z_{ij} = s) = \pi_0^s,

fixing zij=0z_{ij}=0 for unobserved edges (i,j)Eobs(i, j) \notin \mathcal E_{\rm obs}.

Given AobsA_{\rm obs}, node features XX, and observed labels YLY_{\mathcal L}, the ideal Bayesian posterior and the marginal Bayes-optimal predictive distribution are

p(ZAobs,X,YL)p(AobsZ)p(Z)p(YLX,Z),p(Z|A_{\rm obs},X,Y_{\mathcal L}) \propto p(A_{\rm obs}|Z) p(Z) p(Y_{\mathcal L}|X,Z),

p(yiAobs,X,YL)=EZp(Aobs,X,YL)[p(yiX,Z)].p^\star(y_i|A_{\rm obs}, X, Y_{\mathcal L}) = \mathbb{E}_{Z \sim p(\cdot|A_{\rm obs},X,Y_{\mathcal L})} \left[ p(y_i|X,Z) \right].

Due to computational intractability, the true posterior is approximated via a variational distribution qϕ(ZAobs,X,YL)q_\phi(Z|A_{\rm obs}, X, Y_{\mathcal L}), and node label predictions are obtained by sampling KK instantiations of ZZ: \begin{equation} \hat{p}\theta(y_i|A{\rm obs},X) \approx \frac{1}{K} \sum_{k=1}K p_\theta(y_i|X, Z{(k)}), \quad Z{(k)} \sim q_\phi. \end{equation}

2. Variational Posterior Inference and Structural Loss

The variational posterior qϕ(Z)q_\phi(Z) is parameterized by a small GCN encoder and an MLP edge decoder:

Hϕ=GCNϕ(Aobs,X,YL),gij=MLPϕ([hϕ,ihϕ,j])H_\phi = {\rm GCN}_\phi(A_{\rm obs}, X, Y_{\mathcal L}), \qquad g_{ij} = {\rm MLP}_\phi\left([h_{\phi,i} \| h_{\phi,j}]\right)

so that

qϕ(zij=s)=exp(gijs)s{1,0,1}exp(gijs).q_\phi(z_{ij}=s) = \frac{\exp(g_{ij}^s)}{\sum_{s' \in \{-1, 0, 1\}} \exp(g_{ij}^{s'})}.

Sampling ZZ for gradient-based training is performed using Gumbel–Softmax relaxation.

The structural parameters ϕ\phi are optimized by maximizing the ELBO: \begin{equation} \mathcal{L}{\rm struct}(\phi) = {\rm KL}(q\phi(Z) | p(Z)) - \mathbb{E}{Z\sim q\phi}[\log p(A_{\rm obs}|Z)]. \end{equation} This regularizes structural inference, penalizing divergence from prior and misfit to observed edges.

3. Sparse Signed Message Passing (S²Layer) Mechanism

Given a sampled ZZ, SSMPN employs the Sparse Signed Message Passing (S²Layer):

  • Input node states HRn×dinH\in\mathbb{R}^{n \times d_{\rm in}} are projected to values V=HWvV = HW_v.
  • For node ii, neighbors define a value dictionary Vi=[vj]jNi(Z)V_i = [v_j]_{j \in \mathcal N_i(Z)}.
  • The target is ti=Wthit_i = W_t h_i.
  • Local aggregation weights αi\alpha_i are obtained by solving a LASSO problem: \begin{equation} \alpha_i* = \arg\min_{\alpha \in \mathbb{R}{|\mathcal N_i|}} | t_i - V_i \alpha |_22 + \lambda | \alpha |_1 \end{equation}
  • Coefficients are partitioned:

αij+={αijzij=+1 0otherwise,αij={αijzij=1 0otherwise\alpha_{ij}^+ = \begin{cases}\alpha_{ij} & z_{ij}=+1 \ 0 & \text{otherwise} \end{cases}, \quad \alpha_{ij}^- = \begin{cases}\alpha_{ij} & z_{ij}=-1 \ 0 & \text{otherwise} \end{cases}

  • Signed message aggregation is: \begin{equation} \tilde{h}i = W_o\left(\sum{j \in \mathcal N_i+} \alpha_{ij}+ v_j - \gamma \sum_{j \in \mathcal N_i-} |\alpha_{ij}-| v_j \right) + b \end{equation} with γ0\gamma \ge 0 balancing negative-message attenuation.
  • Updated node states: hi=σ(h~i)h_i' = \sigma(\tilde{h}_i).

4. Network Composition, Classification, and Loss Functions

SSMPN stacks LL S²Layers:

H(0)=X,H(+1)=σ(S2Layerθ(H(),Z))H^{(0)} = X, \quad H^{(\ell+1)} = \sigma({\rm S}^2{\rm Layer}_\theta(H^{(\ell)}, Z))

The classification head computes logits:

i(Z;θ)=Wchi(L)+c,pθ(yiX,Z)=softmax(i)\ell_i(Z;\theta)=W_c\,h_i^{(L)}+c, \qquad p_\theta(y_i|X,Z)=\mathrm{softmax}(\ell_i)

Monte Carlo marginalization over KK samples of ZZ produces the predictive node class distribution.

Training is governed by

  • classification loss:

Lcls=1LiLlog[1Kk=1Kpθ(yiX,Z(k))]\mathcal{L}_{\rm cls} = -\frac{1}{|\mathcal{L}|} \sum_{i \in \mathcal{L}} \log\left[\frac{1}{K} \sum_{k=1}^{K} p_\theta(y_i|X, Z^{(k)})\right]

  • sparsity loss:

Lsparse=1ni=1nEZqϕ[αi(Z)1]1nKi=1nk=1Kαi(Z(k))1\mathcal{L}_{\rm sparse} = \frac{1}{n} \sum_{i=1}^n \mathbb{E}_{Z \sim q_\phi} [\| \alpha_i(Z) \|_1 ] \approx \frac{1}{nK} \sum_{i=1}^n \sum_{k=1}^K \| \alpha_i(Z^{(k)}) \|_1

  • structural loss (negative ELBO) as above.

The total loss is \begin{equation} \mathcal{L}{\rm total}(\theta, \phi) = \mathcal{L}{\rm cls} + \lambda_{\rm sp} \mathcal{L}{\rm sparse} + \lambda{\rm st} \mathcal{L}_{\rm struct}, \end{equation} with typical values λsp=0.01,  λst=0.1\lambda_{\rm sp}=0.01,\;\lambda_{\rm st}=0.1.

5. Algorithmic Workflow

A typical SSMPN training epoch follows these steps:

  1. Encode graph structure and node context with GCNϕ(Aobs,X,YL){\rm GCN}_\phi(A_{\rm obs}, X, Y_{\mathcal L}), yielding edge-type logits and πijs\pi^s_{ij}.
  2. Sample KK signed adjacency matrices Z(1),,Z(K)qϕZ^{(1)},\dots,Z^{(K)} \sim q_\phi.
  3. For each Z(k)Z^{(k)}, propagate H(0)=XH^{(0)}=X through LL S²Layers to obtain H(L)H^{(L)}, compute class posteriors pθ(k)(yi)p_\theta^{(k)}(y_i), and accumulate Lcls\mathcal{L}_{\rm cls} and Lsparse\mathcal{L}_{\rm sparse}.
  4. Compute Lstruct\mathcal{L}_{\rm struct} from KL divergence and expected log-likelihood under qϕq_\phi.
  5. Backpropagate the total loss to update parameters (θ,ϕ)(\theta, \phi).

6. Robustness to Structural Uncertainty and Heterophily

SSMPN directly addresses both edge uncertainty and heterophily through:

  • Posterior marginalization: Explicit integration over plausible signed graph structures curbs over-reliance on any single adjacency, with theory (Theorem 4.1) bounding excess classification risk by qϕp1\|q_\phi-p\|_1.
  • Signed message aggregation: Subtracting negative-neighbor messages as in the signed aggregation formula enables separation of heterophilic signals. Under the CSBM, the expected message-passing update enhances inter-cluster separation when pout>pinp_{\rm out} > p_{\rm in} and W0W_- \succeq 0 (Theorem 6.4).
  • Sparsity enforcement: LASSO-based neighbor selection within each S²Layer ensures that only the most informative neighbors contribute, mitigating oversmoothing, especially as network depth increases.

Ablation studies show that removing posterior marginalization ("NoPosterior") induces a ~7% drop in accuracy on the Texas dataset, and fixing hard edge signs (no uncertainty) results in a ~4% drop. Disabling sparse or sign-aware aggregation triggers rapid loss of performance with depth.

7. Empirical Performance and Benchmarks

SSMPN was evaluated on nine established heterophilic benchmarks, including RomanEmpire, Minesweeper, AmazonRatings, Chameleon, Squirrel, Actor, Cornell, Texas, and Wisconsin, with global homophily Gh\mathcal{G}_h values as low as zero (high heterophily) and up to ~0.23.

On node classification tasks (mean accuracy over ten splits), SpaM achieved top scores on 8/9 benchmarks. Example results: | Dataset | SpaM Accuracy (%) | Best Baseline (%) | |---------------|------------------|--------------------| | RomanEmpire | 75.0 | 70.3 | | Texas | 83.8 | 76.7 | | Wisconsin | 72.6 | ~65.8 |

Under structural perturbations—including random edge deletion (up to 60%), Gaussian feature noise, and adversarial edge flips—SpaM exhibited substantially more graceful degradation than GCN and GAT.

Efficiency and accuracy were retained on large-scale graphs (Penn94, arXiv-year, snap-patents); for instance, on arXiv-year, SSMPN attained 52.1% accuracy versus a best baseline of ~47.6%.

8. Synthesis and Significance

The Sparse Signed Message Passing Network integrates:

  1. Bayesian inference over signed graph edges,
  2. Monte Carlo posterior marginalization for uncertainty-aware prediction,
  3. LASSO-based sparse neighbor selection,
  4. Explicit sign-aware message aggregation.

This explicit modeling of structural uncertainty and signed relationships establishes provable robustness to noise and heterophily, while maintaining competitive performance in standard homophilic settings (Choi et al., 3 Jan 2026). These methodological contributions address critical limitations of established GNNs on non-homophilic and noisy graphs, providing a robust probabilistic framework for semi-supervised learning under structural uncertainty.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Sparse Signed Message Passing Network.