2000 character limit reached

MoEGCL: Mixture Ego-Graph Contrastive Learning

Updated 13 November 2025

The paper introduces MoEGCL, a method that constructs 5 distinct ego-graph views per node and uses contrastive losses to maximize mutual information.
MoEGCL employs shared GCN encoders and a mixture-of-experts gating mechanism to fuse multi-view embeddings, enhancing local structure capture and feature expressivity.
Empirical findings show MoEGCL outperforms existing methods in node classification, link prediction, and multi-view clustering, validating its innovative design.

Ego Graph Contrastive Learning (EGCL) encompasses a class of methods that employ node-centered ego-graph structures to realize contrastive representation learning in graph-based problems. Recent implementations, such as the MoEGCL frameworks (Li et al., 2023, Zhu et al., 8 Nov 2025), have formalized multi-subgraph sampling and mutual information maximization protocols to boost the expressivity and utility of node or sample representations for tasks ranging from self-supervised node classification to multi-view clustering.

1. Formal Definitions of Ego-Graphs and Their Role in Representation Learning

"Editor’s term": Ego-graph—refers to a node-centric subgraph capturing the local topology and semantics around a central node. In MoEGCL (Li et al., 2023), for an input graph $\mathcal{G}=(X,A)$ with $N$ nodes and node features $X$ , each node $v_i$ is associated with $K=5$ ego-graphs providing different structural perspectives:

Basic (core) subgraph: $\mathrm{idx}=\{i\}$ isolates $v_i$ .
1-hop neighborhood: $\mathrm{idx}=\{j:\mathrm{dist}(v_i,v_j)\leq 1\}$ , capturing immediate neighbors.
Intimate subgraph: top- $l$ nodes by Personalized PageRank similarity.
Communal subgraph: all nodes in the same cluster as $v_i$ , with cluster membership determined via differentiable K-means.
Full subgraph: the entire graph, where an embedding mixture $(1-\eta)\mathcal{R}(H)+\eta h_i$ preserves individuality.

In multi-view clustering scenarios (Zhu et al., 8 Nov 2025), for the $m$ -th view, the ego-graph for sample $i$ is encoded in the adjacency vector $V^m_i$ extracted from a k-NN graph on learned features $z_i^m$ . This establishes a modular instance-level graph context for each sample and view.

2. Construction and Fusion of Multiple Ego-Graphs

The fine-grained construction and fusion of ego-graphs are pivotal to EGCL's representational power:

MoEGCL (Li et al., 2023): Constructs all $K=5$ node-centered subgraphs per node, independently encodes each via a shared GCN, and pools embeddings.
MoEGF (Zhu et al., 8 Nov 2025): For multi-view clustering, fuses per-view ego-graphs via a "mixture-of-experts" gating mechanism. Concatenated view embeddings $\mathbf{z}_i = [z^1_i; z^2_i; \ldots; z^M_i]$ are passed through an MLP to produce softmax gating weights $\mathcal{C}^m_i$ . The fused ego-graph adjacency vector for sample $i$ is $\mathbf{V}_i = \sum_{m=1}^M \mathcal{C}^m_i V^m_i$ , yielding a fused graph for downstream GCN encoding.

This protocol bypasses coarse view-level fusion, enabling sample-level fusion that preserves heterogeneous local structures and relationships specific to each node or sample.

3. Contrastive Objectives and Mutual Information Maximization

Contrastive learning within EGCL frameworks aims to maximize the mutual information between different views of the same node or sample, while minimizing alignment with corruptions or negatives:

MoEGCL (Li et al., 2023): Employs a readout function $\mathcal{R}$ $R$ to pool each subgraph embedding $v^k_i$ $v_{i}^{k}$ and its negative $u^k_i$ $u_{i}^{k}$ . The contrastive losses implement either:
- Core-View (CV): Contrasts basic vs. all other subgraphs.
- Full-Graph (FG): Contrasts all pairs among the $K$ ego-graphs.
- Both utilize a bilinear discriminator $\mathcal{D}(x, y) = \sigma(x^\top W_d y)$ and binary cross-entropy loss.
EGCL module (Zhu et al., 8 Nov 2025): Projects both fused GCN outputs $\hat{h}_i$ and view-specific features $h_i^m$ into a common latent space, applying a cosine similarity metric. The EGCL loss is:

$L_{\text{Egc}} = -\frac{1}{2N} \sum_{i=1}^N \sum_{m=1}^M \log \left( \frac{\exp(C(\hat{h}_i, h_i^m)/\tau)}{\sum_{j=1}^N \exp((1-S_{ij}) \cdot C(\hat{h}_i, h_j^m)/\tau) - \exp(1/\tau)} \right)$

This penalizes cases where fused and view-specific representations disagree for different clusters, thereby enforcing both instance-level and cluster-level alignment.

This suggests that EGCL frameworks structurally encourage embeddings to reflect both individual identity and shared cluster membership, enabled by ego-graph-aware loss formulations.

4. Encoder Architectures and Model Implementation

EGCL models rely on GCN architectures for encoding subgraphs or fused graphs:

MoEGCL (Li et al., 2023):
- Transductive tasks employ a single-layer GCN: $H = \mathrm{PReLU}(\hat{D}^{-1/2}\hat{A}\hat{D}^{-1/2}XW)$ , typically with output dimension $F'=512$ .
- Inductive tasks use a residual GCN: $H = \mathrm{PReLU}(\hat{D}^{-1/2}\hat{A}\hat{D}^{-1/2}XW + \hat{A}W_{\text{skip}})$ .
MoEGCL (Zhu et al., 8 Nov 2025): After sample-level graph fusion, a two-layer GCN $\tilde{Z} = \tilde{D}^{-1/2} \tilde{S} \tilde{D}^{-1/2} (\tilde{D}^{-1/2} \tilde{S} \tilde{D}^{-1/2} Z W^0) W^1$ is used, where $\tilde{S}=I_N+S$ .

In both cases, readout, projection, and gating MLPs are employed for view aggregation and alignment.

5. Training Algorithms, Hyperparameters, and Regularization

Training proceeds in epoch-wise cycles using Adam optimizers:

MoEGCL (Li et al., 2023):
- Uses $K=5$ views, $d=1$ (neighbor range), $l=20$ (PPR ranking), $C=128$ clusters, $\alpha=0.15$ (PPR damping), $\beta=10$ , $\eta=0.01$ (mixing coefficient), early stopping, and neighborhood sampling for large-scale graphs.
- Corruption techniques include diffusion on $A$ , feature shuffling, and graph-swapping.
MoEGCL (Zhu et al., 8 Nov 2025):
- Pre-trains MLP autoencoders per view, then fine-tunes the fusion and contrastive modules for 300 epochs (batch size 256, lr=3e–4, $\tau=0.5$ , $\lambda=1$ , $d_\psi=512$ , $d_\phi=128$ ).
- Employs $L_{\text{Rec}}$ for reconstruction fidelity and $L_{\text{Egc}}$ for alignment; no adversarial norms are used.

Clustering is performed post-training with k-means on projected fused representations.

6. Empirical Outcomes and Ablation Studies

EGCL and MoEGCL modules have demonstrated state-of-the-art performance across a range of tasks and datasets:

Node classification (Li et al., 2023): On Cora, MNCSCL-CV reached 84.7% accuracy; MNCSCL-FG and MNCSCL-CV outperformed DGI, GMI, GIC, GRACE, MVGRL, and even matched supervised GCNs.
Link prediction (Li et al., 2023): AUC/AP scores up to 94.8/94.2, surpassing GIC by 1.3–1.5 points.
Multi-view clustering (Zhu et al., 8 Nov 2025): Achieved highest scores on Caltech5V, MNIST, LGG, WebKB, RGBD, and LandUse—e.g., MNIST ACC/NMI/PUR = 0.9920/0.9747/0.9920; Caltech5V ACC = 0.8207.
Ablation (Zhu et al., 8 Nov 2025): Removing MoEGF, EGCL, MoE gating, or GCN causes significant drops (up to 24% in ACC), indicating the necessity of each component.
A plausible implication is that the MoEGCL architecture's fine-grained fusion and contrastive alignment mechanisms are directly responsible for its empirical gains.

7. Significance, Implications, and Limitations

EGCL, as instantiated in MoEGCL, advances the paradigm of contrastive representation learning on graphs by:

Enabling multiple ego-graph views per node or sample, rather than single ambiguous neighborhoods.
Allowing instance- and cluster-level contrastive objectives, yielding robust, transferable node/sample representations.
Achieving fine-grained, sample-level information fusion critical for multi-view graph applications, markedly outperforming conventional view-level graph fusion.
Avoiding adversarial or norm-based regularizers, relying instead on structural and semantic alignment via ego-graph contrastive losses.

Performance stability across hyperparameter choices is reported, and convergence occurs within 400 epochs in practice.

However, large-scale graphs and datasets require careful batching and neighborhood sampling to manage computational footprint. All gains depend crucially on methodical construction and fusion of ego-graphs, as demonstrated by extensive ablation.

In conclusion, Ego Graph Contrastive Learning establishes a foundation for sophisticated graph representation learning, leveraging multi-perspective local structures and contrastive mutual information objectives for superior performance in node-centric and clustering tasks, with continued research promising further generalizations and optimizations (Li et al., 2023, Zhu et al., 8 Nov 2025).

PDF Markdown Chat (Pro)

References (2)

Contrastive Representation Learning Based on Multiple Node-centered Subgraphs (2023)

MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering (2025)

Follow Topic

Get notified by email when new papers are published related to Mixture of Ego-Graphs Contrastive Representation Learning (MoEGCL).