Papers
Topics
Authors
Recent
2000 character limit reached

Generalized Motifs-Based Naïve Bayes Model

Updated 4 January 2026
  • The paper introduces a generalized Naïve Bayes framework that leverages multiple motif structures for accurate link sign prediction in signed graphs.
  • It employs dual architectures—GMMNB and FGMNB—using role-based log-likelihood estimators to achieve superior AUC and accuracy over embedding-based methods.
  • Motif coverage analysis highlights the practical significance of 3- and 4-node motifs in applications such as fraud detection and trust assessment.

A generalized multiple motifs-based Naïve Bayes model is a theoretically grounded probabilistic framework for predicting properties of complex networks—most notably, for link sign prediction in signed graphs. This approach systematically incorporates heterogeneous influences from local motif structures by quantifying differentiated roles of neighboring nodes or edges and aggregating information across multiple motif instances. Two principal architectures are used: a linear Naïve Bayes combination (GMMNB) treating motif-derived scores as independent evidence, and a feature-driven ensemble method (FGMNB) leveraging machine learning to integrate high-dimensional motif features for enhanced predictive performance. The methodology provides both interpretable motif-level statistics and robust, empirically-validated predictive accuracy, surpassing established embedding-based baselines in benchmark evaluations (Ran et al., 28 Dec 2025).

1. Motif Structures and Role Functions in Signed Networks

In an undirected signed graph G=(N,L,W)G=(N,L,W) where NN is the node set, LL is the set of links, and WW provides edge labels +/+/-, a motif is defined as a small, connected subgraph whose arrangement of positive and negative edges is statistically overrepresented. For each candidate edge (A,B)(A,B) whose sign is to be inferred, the algorithm identifies all motif instances SiS_i within a local window (typically covering 3- and 4-node configurations) that incorporate (A,B)(A,B). This captures balance-theoretic and status-theoretic structural regularities.

The classical single-motif Naïve Bayes (SMNB) approach assumes uniform influence across all neighboring nodes in a motif. The generalized model instead distinguishes two roles:

  • Common Link (CL): The neighbor is directly linked to either AA or BB.
  • Common Node (CN): The neighbor is present in the motif but not directly linked to AA or BB.

For each motif type SiS_i and neighbor MM occupying a consistent structural role, the algorithm computes separate role-based likelihood estimators:

RSi(M)GSMNBCL=CLN+Si(M)+1CLNSi(M)+1,RSi(M)GSMNBCN=CNN+Si(M)+1CNNSi(M)+1R^{\mathrm{GSMNB-CL}}_{S_i(M)} = \frac{CLN_{+_{S_i(M)}} + 1}{CLN_{-_{S_i(M)}} + 1}, \qquad R^{\mathrm{GSMNB-CN}}_{S_i(M)} = \frac{CNN_{+_{S_i(M)}} + 1}{CNN_{-_{S_i(M)}} + 1}

where CLN+Si(M)CLN_{+_{S_i(M)}} and CLNSi(M)CLN_{-_{S_i(M)}} are counts of positive and negative labelings of (A,B)(A,B) when MM is in the CL role, and similarly for CN (Ran et al., 28 Dec 2025).

2. Single-Motif Naïve Bayes Prediction

Given the set Si(A,B)S_i(A,B) of all motif instances around (A,B)(A,B) for predictor SiS_i, the model forms the posterior-odds score as a product of the role-based likelihood ratios:

rAB(Si)MSi(A,B)RSi(M)r_{AB}(S_i) \propto \prod_{M \in S_i(A,B)} R_{S_i(M)}

r~AB(Si)=MSi(A,B)logRSi(M)\tilde r_{AB}(S_i) = \sum_{M \in S_i(A,B)} \log R_{S_i(M)}

This additive log-likelihood form encodes the cumulative evidence from all eligible motifs involving (A,B)(A,B) and its neighbors, modulated by their structural role within each motif (Ran et al., 28 Dec 2025).

3. Extension to Multiple Motifs: GMMNB and FGMNB

The model integrates features from multiple motifs via two main strategies:

  • GMMNB (Generalized Multiple Motifs-based Naïve Bayes): Each of the nn distinct motif-derived predictors S1,,SnS_1, \dots, S_n produces an individual log-likelihood score r~AB(Si)\tilde r_{AB}(S_i), and these are linearly combined:

r~ABGMMNB=i=1nSi(A,B)loga+i=1nMSi(A,B)logRSi(M)\tilde r_{AB}^{\mathrm{GMMNB}} = \sum_{i=1}^n |S_i(A,B)| \log a + \sum_{i=1}^n \sum_{M \in S_i(A,B)} \log R_{S_i(M)}

Here, a=f/f+a = f_-/f_+ corrects for class imbalance, with f+f_+, ff_- the fractions of positive and negative links (Ran et al., 28 Dec 2025).

  • FGMNB (Feature-driven Generalized Motif-based Naïve Bayes): A 9-dimensional vector xAB=[r~AB(S1),...,r~AB(S9)]x_{AB} = [\tilde r_{AB}(S_1), ..., \tilde r_{AB}(S_9)] (motifs S1S_1 to S9S_9) is constructed for each candidate edge and passed into a machine-learning classifier (e.g., XGBoost) that can learn nonlinear feature interactions:

L(Θ)=j=1N(yj,F(xj))+k=1KΩ(fk)\mathcal{L}(\Theta) = \sum_{j=1}^N \ell\left(y_j, F(x_j)\right) + \sum_{k=1}^K \Omega(f_k)

where F(x)=k=1Kfk(x)F(x)=\sum_{k=1}^K f_k(x) represents an ensemble of regression trees, and \ell is the logistic loss (Ran et al., 28 Dec 2025).

4. Algorithmic Workflow and Pseudocode

The FGMNB procedure applies the following protocol for sign prediction:

  • Preprocess the signed network (removing ambiguous edges, ensuring undirectedness).
  • For each train/test split:
    • Construct balanced train/test sets by edge sampling.
    • Extract (A,B)(A,B) pairs and enumerate all motif instances SiS_i they participate in.
    • Compute role-based log-likelihood ratios for each motif instance, forming xABx_{AB}.
    • Train XGBoost on the resulting feature-label pairs.
    • Apply the trained model to the test set and record performance.
  • Repeat across multiple random splits to estimate mean and variance of metrics.

At inference, the same motif scoring and feature construction is applied, followed by prediction via the XGBoost ensemble (Ran et al., 28 Dec 2025).

Step Operation Output
Preprocessing Remove ambiguous/bidirectional edges, binarize labels Cleaned graph
Motif extraction Find all 3-/4-node motifs containing each (A,B)(A,B) Motif instances
Feature gen. Compute log-likelihood ratios from role-based counts Feature vector
Training XGBoost on xABx_{AB} Trained classifier
Prediction Apply classifier to test instances Sign prediction

5. Experimental Validation and Quantitative Performance

The approach was empirically evaluated on four large signed networks:

  • BitcoinAlpha ($3,783$ nodes, $14,124$ edges, 91.6%91.6\% positive)
  • BitcoinOTC ($5,881$ nodes, $21,492$ edges, 86.4%86.4\% positive)
  • Wiki-RfA ($11,221$ nodes, $171,761$ edges, 77.4%77.4\% positive)
  • Slashdot ($82,140$ nodes, $500,481$ edges, 76.4%76.4\% positive)

Balanced sampling (equal positive/negative links) ensures robust evaluation. Across 100 random splits, GMMNB outperforms all competing embedding-based baselines, and FGMNB further improves AUC and Accuracy. For instance, on BitcoinAlpha, FGMNB achieves AUC 0.851±0.020.851 \pm 0.02, outperforming DNE-SBP (0.722±0.020.722 \pm 0.02) and SGCN (0.826±0.010.826 \pm 0.01). Notably, FGMNB’s margin exceeds +0.15+0.15 AUC over baseline on BitcoinAlpha and +0.08+0.08 on BitcoinOTC. Only on Slashdot does SE-SGformer achieve a higher AUC ($0.974$), but FGMNB generalizes better across networks (Ran et al., 28 Dec 2025).

Method BitcoinAlpha AUC BitcoinOTC AUC Wiki-RfA AUC Slashdot AUC
DNE-SBP 0.722 ± 0.02 0.836 ± 0.01 0.760 ± 0.03 0.812 ± 0.02
GMMNB 0.802 ± 0.02 0.903 ± 0.03 0.819 ± 0.04 0.866 ± 0.02
FGMNB 0.851 ± 0.02 0.920 ± 0.01 0.853 ± 0.02 0.894 ± 0.01

6. Motif Selection and Coverage Analysis

"Motif coverage" is defined as the fraction of all links in LL that participate in at least one instance of motif SS:

Coverage(S)=1LLIS(),IS()={1,if  in S 0,otherwise\mathrm{Coverage}(S) = \frac{1}{|L|} \sum_{\ell \in L} I_S(\ell), \quad I_S(\ell) = \begin{cases} 1, & \text{if } \ell \text{ in } S \ 0, & \text{otherwise} \end{cases}

Empirically, AUC under GSMNB-CL increases monotonically with motif coverage: motifs with the highest coverage are typically most predictive. On BitcoinAlpha, the 4-node motif S4S_4 achieves AUC $0.814$; on BitcoinOTC, Wiki-RfA, and Slashdot, motif S2S_2 (bridge-plus-balanced-triangle) is dominant. A plausible implication is that practical feature engineering for sign prediction should prioritize motif types that maximize both coverage and discriminative AUC. The results further support the sufficiency of 3-node motifs in trust-dense environments, while 4-node motifs are essential in balanced community structures (Ran et al., 28 Dec 2025).

7. Implications and Practical Significance

The generalized multiple motifs-based Naïve Bayes model delivers four major contributions:

  1. A Naïve Bayes formulation with explicit modeling of heterogeneous neighbor influences via structurally-defined role functions.
  2. Integration of multiple motif-derived log-likelihood scores—either linearly (GMMNB) or via nonlinear feature-driven ensembles (FGMNB).
  3. Empirically validated superiority over embedding-based methods for sign prediction across large real-world signed graphs, with robust gains in both AUC and Accuracy.
  4. Motif coverage and discriminative power provide actionable metrics for motif selection—informing scalable feature engineering in network analysis and sign prediction tasks.

These findings provide a systematic and interpretable framework applicable to trust assessment, fraud detection, and other relational inference scenarios in complex networks (Ran et al., 28 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Generalized Multiple Motifs-based Naïve Bayes Model.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube