Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 91 tok/s
Gemini 3.0 Pro 46 tok/s Pro
Gemini 2.5 Flash 148 tok/s Pro
Kimi K2 170 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

MoEGCL: Mixture Ego-Graph Contrastive Learning

Updated 13 November 2025
  • The paper introduces MoEGCL, a method that constructs 5 distinct ego-graph views per node and uses contrastive losses to maximize mutual information.
  • MoEGCL employs shared GCN encoders and a mixture-of-experts gating mechanism to fuse multi-view embeddings, enhancing local structure capture and feature expressivity.
  • Empirical findings show MoEGCL outperforms existing methods in node classification, link prediction, and multi-view clustering, validating its innovative design.

Ego Graph Contrastive Learning (EGCL) encompasses a class of methods that employ node-centered ego-graph structures to realize contrastive representation learning in graph-based problems. Recent implementations, such as the MoEGCL frameworks (Li et al., 2023, Zhu et al., 8 Nov 2025), have formalized multi-subgraph sampling and mutual information maximization protocols to boost the expressivity and utility of node or sample representations for tasks ranging from self-supervised node classification to multi-view clustering.

1. Formal Definitions of Ego-Graphs and Their Role in Representation Learning

"Editor’s term": Ego-graph—refers to a node-centric subgraph capturing the local topology and semantics around a central node. In MoEGCL (Li et al., 2023), for an input graph G=(X,A)\mathcal{G}=(X,A) with NN nodes and node features XX, each node viv_i is associated with K=5K=5 ego-graphs providing different structural perspectives:

  • Basic (core) subgraph: idx={i}\mathrm{idx}=\{i\} isolates viv_i.
  • 1-hop neighborhood: idx={j:dist(vi,vj)1}\mathrm{idx}=\{j:\mathrm{dist}(v_i,v_j)\leq 1\}, capturing immediate neighbors.
  • Intimate subgraph: top-ll nodes by Personalized PageRank similarity.
  • Communal subgraph: all nodes in the same cluster as viv_i, with cluster membership determined via differentiable K-means.
  • Full subgraph: the entire graph, where an embedding mixture (1η)R(H)+ηhi(1-\eta)\mathcal{R}(H)+\eta h_i preserves individuality.

In multi-view clustering scenarios (Zhu et al., 8 Nov 2025), for the mm-th view, the ego-graph for sample ii is encoded in the adjacency vector VimV^m_i extracted from a k-NN graph on learned features zimz_i^m. This establishes a modular instance-level graph context for each sample and view.

2. Construction and Fusion of Multiple Ego-Graphs

The fine-grained construction and fusion of ego-graphs are pivotal to EGCL's representational power:

  • MoEGCL (Li et al., 2023): Constructs all K=5K=5 node-centered subgraphs per node, independently encodes each via a shared GCN, and pools embeddings.
  • MoEGF (Zhu et al., 8 Nov 2025): For multi-view clustering, fuses per-view ego-graphs via a "mixture-of-experts" gating mechanism. Concatenated view embeddings zi=[zi1;zi2;;ziM]\mathbf{z}_i = [z^1_i; z^2_i; \ldots; z^M_i] are passed through an MLP to produce softmax gating weights Cim\mathcal{C}^m_i. The fused ego-graph adjacency vector for sample ii is Vi=m=1MCimVim\mathbf{V}_i = \sum_{m=1}^M \mathcal{C}^m_i V^m_i, yielding a fused graph for downstream GCN encoding.

This protocol bypasses coarse view-level fusion, enabling sample-level fusion that preserves heterogeneous local structures and relationships specific to each node or sample.

3. Contrastive Objectives and Mutual Information Maximization

Contrastive learning within EGCL frameworks aims to maximize the mutual information between different views of the same node or sample, while minimizing alignment with corruptions or negatives:

  • MoEGCL (Li et al., 2023): Employs a readout function R\mathcal{R} to pool each subgraph embedding vikv^k_i and its negative uiku^k_i. The contrastive losses implement either:
    • Core-View (CV): Contrasts basic vs. all other subgraphs.
    • Full-Graph (FG): Contrasts all pairs among the KK ego-graphs.
    • Both utilize a bilinear discriminator D(x,y)=σ(xWdy)\mathcal{D}(x, y) = \sigma(x^\top W_d y) and binary cross-entropy loss.
  • EGCL module (Zhu et al., 8 Nov 2025): Projects both fused GCN outputs h^i\hat{h}_i and view-specific features himh_i^m into a common latent space, applying a cosine similarity metric. The EGCL loss is:

LEgc=12Ni=1Nm=1Mlog(exp(C(h^i,him)/τ)j=1Nexp((1Sij)C(h^i,hjm)/τ)exp(1/τ))L_{\text{Egc}} = -\frac{1}{2N} \sum_{i=1}^N \sum_{m=1}^M \log \left( \frac{\exp(C(\hat{h}_i, h_i^m)/\tau)}{\sum_{j=1}^N \exp((1-S_{ij}) \cdot C(\hat{h}_i, h_j^m)/\tau) - \exp(1/\tau)} \right)

This penalizes cases where fused and view-specific representations disagree for different clusters, thereby enforcing both instance-level and cluster-level alignment.

This suggests that EGCL frameworks structurally encourage embeddings to reflect both individual identity and shared cluster membership, enabled by ego-graph-aware loss formulations.

4. Encoder Architectures and Model Implementation

EGCL models rely on GCN architectures for encoding subgraphs or fused graphs:

  • MoEGCL (Li et al., 2023):
    • Transductive tasks employ a single-layer GCN: H=PReLU(D^1/2A^D^1/2XW)H = \mathrm{PReLU}(\hat{D}^{-1/2}\hat{A}\hat{D}^{-1/2}XW), typically with output dimension F=512F'=512.
    • Inductive tasks use a residual GCN: H=PReLU(D^1/2A^D^1/2XW+A^Wskip)H = \mathrm{PReLU}(\hat{D}^{-1/2}\hat{A}\hat{D}^{-1/2}XW + \hat{A}W_{\text{skip}}).
  • MoEGCL (Zhu et al., 8 Nov 2025): After sample-level graph fusion, a two-layer GCN Z~=D~1/2S~D~1/2(D~1/2S~D~1/2ZW0)W1\tilde{Z} = \tilde{D}^{-1/2} \tilde{S} \tilde{D}^{-1/2} (\tilde{D}^{-1/2} \tilde{S} \tilde{D}^{-1/2} Z W^0) W^1 is used, where S~=IN+S\tilde{S}=I_N+S.

In both cases, readout, projection, and gating MLPs are employed for view aggregation and alignment.

5. Training Algorithms, Hyperparameters, and Regularization

Training proceeds in epoch-wise cycles using Adam optimizers:

  • MoEGCL (Li et al., 2023):
    • Uses K=5K=5 views, d=1d=1 (neighbor range), l=20l=20 (PPR ranking), C=128C=128 clusters, α=0.15\alpha=0.15 (PPR damping), β=10\beta=10, η=0.01\eta=0.01 (mixing coefficient), early stopping, and neighborhood sampling for large-scale graphs.
    • Corruption techniques include diffusion on AA, feature shuffling, and graph-swapping.
  • MoEGCL (Zhu et al., 8 Nov 2025):
    • Pre-trains MLP autoencoders per view, then fine-tunes the fusion and contrastive modules for 300 epochs (batch size 256, lr=3e–4, τ=0.5\tau=0.5, λ=1\lambda=1, dψ=512d_\psi=512, dϕ=128d_\phi=128).
    • Employs LRecL_{\text{Rec}} for reconstruction fidelity and LEgcL_{\text{Egc}} for alignment; no adversarial norms are used.

Clustering is performed post-training with k-means on projected fused representations.

6. Empirical Outcomes and Ablation Studies

EGCL and MoEGCL modules have demonstrated state-of-the-art performance across a range of tasks and datasets:

  • Node classification (Li et al., 2023): On Cora, MNCSCL-CV reached 84.7% accuracy; MNCSCL-FG and MNCSCL-CV outperformed DGI, GMI, GIC, GRACE, MVGRL, and even matched supervised GCNs.
  • Link prediction (Li et al., 2023): AUC/AP scores up to 94.8/94.2, surpassing GIC by 1.3–1.5 points.
  • Multi-view clustering (Zhu et al., 8 Nov 2025): Achieved highest scores on Caltech5V, MNIST, LGG, WebKB, RGBD, and LandUse—e.g., MNIST ACC/NMI/PUR = 0.9920/0.9747/0.9920; Caltech5V ACC = 0.8207.
  • Ablation (Zhu et al., 8 Nov 2025): Removing MoEGF, EGCL, MoE gating, or GCN causes significant drops (up to 24% in ACC), indicating the necessity of each component.
  • A plausible implication is that the MoEGCL architecture's fine-grained fusion and contrastive alignment mechanisms are directly responsible for its empirical gains.

7. Significance, Implications, and Limitations

EGCL, as instantiated in MoEGCL, advances the paradigm of contrastive representation learning on graphs by:

  • Enabling multiple ego-graph views per node or sample, rather than single ambiguous neighborhoods.
  • Allowing instance- and cluster-level contrastive objectives, yielding robust, transferable node/sample representations.
  • Achieving fine-grained, sample-level information fusion critical for multi-view graph applications, markedly outperforming conventional view-level graph fusion.
  • Avoiding adversarial or norm-based regularizers, relying instead on structural and semantic alignment via ego-graph contrastive losses.

Performance stability across hyperparameter choices is reported, and convergence occurs within 400 epochs in practice.

However, large-scale graphs and datasets require careful batching and neighborhood sampling to manage computational footprint. All gains depend crucially on methodical construction and fusion of ego-graphs, as demonstrated by extensive ablation.

In conclusion, Ego Graph Contrastive Learning establishes a foundation for sophisticated graph representation learning, leveraging multi-perspective local structures and contrastive mutual information objectives for superior performance in node-centric and clustering tasks, with continued research promising further generalizations and optimizations (Li et al., 2023, Zhu et al., 8 Nov 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Mixture of Ego-Graphs Contrastive Representation Learning (MoEGCL).