Contextual Graph Markov Model (CGMM)
- Contextual Graph Markov Model (CGMM) is a deep generative model that builds layered representations by diffusing frozen neighbor states to capture l-hop contexts.
- It employs a layer-wise probabilistic approach with Expectation-Maximization training, ensuring cycle-free inference and scalable computation on cyclic, directed, or undirected graphs.
- The model aggregates hidden-state assignments into fixed-size graph fingerprints for downstream tasks like classification, demonstrating strong performance on benchmarks such as INEX2005, MUTAG, CPDB, and AIDS.
A Contextual Graph Markov Model (CGMM) is a deep, generative probabilistic model designed for the processing and analysis of general graphs, including structures that are cyclic, directed, or undirected. CGMMs construct a hierarchical stack of local graphical models, each layer encoding vertex and edge structure while diffusing neighborhood context efficiently through the graph. The resulting structure supports scalable learning and yields discriminative feature representations for downstream tasks such as graph classification. Two closely related lines of development are highlighted in recent literature: first, the deep generative CGMM framework itself (Bacciu et al., 2018), and second, Contextual Markov Networks (CMNs) and their structure-learning via marginal pseudo-likelihood (Pensar et al., 2021).
1. Layer-wise Probabilistic Architecture
CGMM builds a stack of layers, where each layer learns to encode vertex labels conditioned on the hidden states of neighboring vertices inferred in previous layers. The process is constructive:
- Layer 1 assigns a hidden state to each vertex , from which the observed label is emitted, independently of graph structure.
- Layer uses the inferred states of neighbors from previous layers as "frozen" context, conditioning each new hidden state on this context before generating .
By stacking layers, the receptive field of each node grows to cover progressively larger -hop neighborhoods. The key property is that within a single layer, dependencies are only on frozen states from lower layers, resulting in cycle-free (i.e., no inference loops) computation regardless of graph topology (Bacciu et al., 2018).
2. Mathematical Formulation
Formally, the likelihood at layer for a training set of graphs is:
where encapsulates frozen neighbor states from previous layers. Direct conditioning on entire neighbor configurations is combinatorial; CGMM resolves this with two latent "switch" variables— selects the context-providing previous layer, and selects a relevant edge label class—resulting in the recursive decomposition:
with further averaging over neighbors in the selected layer and edge class. The context-diffusion principle is implicit: new hidden states are assigned to maximize posterior probability conditioned on observed labels and frozen neighbor context.
3. Training and Scalability
Each layer is trained independently using the Expectation-Maximization (EM) algorithm. In the E-step, the responsibilities
are computed for all latent variables, and in the M-step, parameters for multinomial distributions are updated. This layer-wise EM paradigm ensures no loopy inference is required within layers. The per-layer computational complexity for a graph is
where is the number of vertices, the number of lower layers, the edge label classes, and the hidden state count. Because and are typically small (tens), this yields high efficiency, with scalability linear in the number of vertices and small dependence on model parameters (Bacciu et al., 2018).
4. Graph Fingerprints and Downstream Use
Upon completion of layer-wise training, the CGMM infers hard hidden-state assignments for each vertex at each layer. These assignments are aggregated to construct a fixed-size graph fingerprint:
This vector summarization enables the application of standard discriminative learners, such as SVMs with RBF or string kernels, to predict graph labels or perform regression (Bacciu et al., 2018).
5. Theoretical Properties and Expressiveness
CGMM is stationary in that a single shared parameter set is used across vertices of all input graphs. Its construction guarantees cycle-free inference, given the strict dependence on frozen previous-layer statistics. The depth of the model mediates the expressivity: an -layer model captures up to -hop contextual dependencies, and in the large- regime, CGMM can distinguish patterns in the same regime as -layer message-passing GNNs, but within a generative modeling formalism. Training and inference both admit linear complexity in dataset size, controlled by small constants determined by , , and (Bacciu et al., 2018).
6. Empirical Results
Benchmark evaluations demonstrate CGMM's efficacy for both tree and general graph classification tasks:
- On the INEX2005 tree classification benchmark, a 4-layer CGMM with achieved approximately 96.7% test accuracy, compared to 97.0% for the PT-kernel.
- On MUTAG, CPDB, and AIDS graph benchmarks, CGMM (depth up to 8, , with/without pooling) matched or outperformed prior models: 81.0% accuracy on CPDB, 84.2% on AIDS, and 91.2% on MUTAG.
- Depth ablation studies found performance improves through layers 3–5, then saturates or slightly drops for very large , indicating saturated contextual propagation and possible overfitting (Bacciu et al., 2018).
7. Contextual Graph Markov Models in Structure Learning
Contextual Markov Networks (CMNs), a related framework, generalize Markov networks by encoding context-specific independences (CSIs), where pairwise dependencies can be switched off in certain neighbor-assignment contexts. CMNs are formally represented as —a graph and a set of edge-contexts.
Structure learning in CMNs leverages marginal pseudo-likelihood (MPL) scoring, supporting efficient model selection without the chordality assumption. The CMN MPL is a product over context-merged neighborhoods:
with closed-form Dirichlet marginalization. A two-level greedy hill climb algorithm alternates between updating the graph and optimizing local contexts (edge assignments), penalizing overly rich context-sets via a prior parameter . This framework is proven statistically consistent: maximizing MPL recovers the true CMN structure asymptotically. Empirical experiments on synthetic and real-world discrete datasets confirm that CMNs achieve highest likelihood and out-of-sample predictive accuracy, using fewer parameters than stratified graph models and outperforming ordinary Markov networks (Pensar et al., 2021).