Mixture of Ego-Graphs Fusion (MoEGF)

Updated 13 November 2025

MoEGF is a fine-grained graph fusion mechanism that adaptively combines per-sample ego-graphs via a Mixture-of-Experts approach.
It constructs KNN-based ego-graphs from multiple views and employs a gating MLP to generate a fused adjacency matrix for GNN processing.
Empirical results show significant accuracy improvements over traditional view-level fusion, underscoring its practical impact on multi-view clustering.

Mixture of Ego-Graphs Fusion (MoEGF) is a fine-grained graph fusion mechanism designed for multi-view clustering within the Mixture of Ego-Graphs Contrastive Representation Learning (MoEGCL) framework. Diverging from traditional view-level fusion, MoEGF aggregates per-sample (ego-graph) structures from multiple data views using a Mixture-of-Experts (MoE) paradigm. This design enables adaptive, sample-specific fusion of ego-graphs to produce a fused adjacency matrix for downstream graph neural network (GNN) processing, substantially enhancing clustering performance by capturing localized multi-view interactions (Zhu et al., 8 Nov 2025).

1. Mathematical Formulation of MoEGF

Given $M$ data views, each sample $i$ ’s representation in view $m$ is encoded as $z_i^m = f^m(x_i^m) \in \mathbb{R}^{d_\psi}$ . For each view, a $k$ -nearest-neighbor (KNN) adjacency matrix $S^m \in \{0,1\}^{N \times N}$ is built:

$S_{ij}^m = \begin{cases} 1 & \text{if } j \in K_i^m \ 0 & \text{otherwise} \end{cases}$

The sample’s ego-graph in view $m$ is the binary vector $V_i^m := (S_{i1}^m,\,S_{i2}^m,\,\dots,S_{iN}^m) \in \{0,1\}^N$ .

The concatenated embedding $z_i := [z_i^1; z_i^2; \ldots; z_i^M] \in \mathbb{R}^{M d_\psi}$ serves as the gating input to a two-layer MLP, yielding softmax weights $C_i \in \mathbb{R}^M$ :

$C_i = \text{softmax}(\text{mlp}^{(1)}(z_i))$

The fused ego-graph vector for sample $i$ is then the convex combination:

$V_i = \sum_{m=1}^M C_i^m V_i^m$

The stacked set of $V_i$ forms the fused adjacency matrix $S \in \mathbb{R}^{N \times N}$ .

To incorporate feature information, a two-layer GCN is applied:

$\tilde{S} = I_N + S;\quad \tilde{D}_{ii} = \sum_j \tilde{S}_{ij}$

$\tilde{Z} = \sigma\left( \tilde{D}^{-1/2} \tilde{S} \tilde{D}^{-1/2} \left( \sigma\left( \tilde{D}^{-1/2} \tilde{S} \tilde{D}^{-1/2} Z W^0 \right) W^1 \right) \right)$

where $Z = [z_1; \dots; z_N]$ and $W^0, W^1$ are learnable parameters.

2. Algorithmic Implementation

A typical training epoch for MoEGF within MoEGCL, assuming minibatch size $B$ , comprises the following steps:

Encoding: For each sample and each view, $z_i^m = f^m(x_i^m)$ ; form concatenated $z_i$ .
KNN Construction: For each view, build $V_i^m$ as the row of $S^m$ .
Gating: Apply $\text{mlp}^{(1)}$ and softmax to $z_i$ , outputting $C_i$ .
Fusion: Compute $V_i$ as the weighted sum of $V_i^m$ using $C_i$ .
Adjacency Assembly: Assemble $\{V_i\}$ into the fused adjacency $S_\text{batch}$ .
GCN Forward: Apply two-layer GCN to obtain $\tilde{Z}_\text{batch}$ .
Projection Heads: Apply separate MLPs to $\tilde{z}_i$ ( $\hat{h}_i$ ) and $z_i^m$ ( $h_i^m$ ).
Loss Computation: Compute autoencoder reconstruction loss $\mathcal{L}_\text{Rec}$ and ego-graph contrastive loss $\mathcal{L}_\text{Egc}$ .
Optimization: Sum total loss $\mathcal{L} = \mathcal{L}_\text{Rec} + \lambda\mathcal{L}_\text{Egc}$ , backpropagate, and update all parameters.

The dominant computational cost per batch is $O(B^2 M)$ for the fusion and $O(B d_\psi d_g)$ for the feature transformations.

3. Comparison to View-Level Fusion Paradigms

Traditional deep multi-view clustering approaches construct one graph per view and perform graph fusion at the view level, assigning global weights to views and yielding a mixture for all samples. In contrast, MoEGF outputs sample-wise fusion coefficients $C_i$ , enabling personalized graph structures per sample.

Empirical results show substantial accuracy gains from this design. For example, removing MoEGF and instead concatenating $z_i^m$ leads to absolute clustering accuracy (ACC) drops of 37.6% (Caltech5V), 6% (RGBD), and 41% (WebKB). MoEGF outperforms state-of-the-art multi-view clustering baselines by over 8% ACC on WebKB and by 4–7% on RGBD (Zhu et al., 8 Nov 2025).

4. Design Decisions, Hyperparameters, and Implementation Notes

Key implementation features and hyperparameter choices are summarized below:

Component	Setting	Notes
Number of Experts	$K = M$	One per view
Gating Network (mlp¹)	Two-layer MLP, softmax output, dropout $p=0.1$	Hidden dim not specified; $d_\psi=512$ used
KNN Graph (per view)	$k$ -nearest neighbors, $k \in [5,10]$ typical	Binary adjacency
Embedding Dimensions	$d_\psi = 512$ , $d_\phi=128$	\
Batch Size	$b = 256$	\
Training Epochs	$T_p = 200$ (pre-train), $T_f = 300$ (fine-tune)	\
MoEGF Mixture Weights	Dense softmax, no regularizer	\

Implementation is amenable to minibatch parallelism and scales as $O(B^2 M)$ with batch size $B$ and number of views $M$ , dominated by fusion and GCN costs. KNN adjacency and gating can be precomputed or batched for efficiency.

5. Integration Within the MoEGCL Framework

MoEGF operates immediately after per-view autoencoder embedding. It delivers the fused adjacency $S$ to a two-layer GCN, producing structure-aware node embeddings $\tilde{Z}$ . The subsequent Ego Graph Contrastive Learning (EGCL) module aligns fused and per-view representations via the loss

$\mathcal{L}_{\mathrm{Egc}} = -\frac{1}{2N} \sum_{i=1}^N \sum_{m=1}^M \log\frac{ \exp ( \cos( \hat h_i, h_i^m ) / \tau ) }{ \sum_{j=1}^N \exp( (1 - S_{ij}) \cos( \hat h_i, h_j^m ) / \tau ) - \exp(1/\tau) }$

Gradients from $\mathcal{L}_{\mathrm{Egc}}$ propagate through the GNN layers and the MoEGF gating MLP, ensuring that the fused structure is optimized for cluster-aware representation learning.

MoEGF advances prior fusion mechanisms for multi-view graph data in deep clustering tasks by providing an alternative to global or view-level fusion. The approach is conceptually related to the class of Mixture-of-Experts graph methods, including MoG (Zhang et al., 23 May 2024), which extend MoE strategies to graph sparsification and subgraph selection via per-node adaptive fusion. MoEGF, however, is specifically tailored for sample-level ego-graph combination, with direct integration into a contrastive clustering framework. Both approaches share the use of per-node/per-sample gating and fusion, but differ in fusion domains (ego-adjacency in MoEGF, Grassmannian spectral fusion in MoG).

A plausible implication is that the fundamental Mixture-of-Experts paradigm, when applied locally to ego-centric structures, generalizes beyond clustering to other graph learning problems, including efficient sparsification, node classification, and adaptive edge selection.

7. Empirical Performance and Observed Impact

On benchmark datasets, MoEGF within MoEGCL results in pronounced accuracy improvements over both naive and coarse-grained fusion strategies, as evinced by substantial drops in ACC upon ablation. The empirical findings underscore the significance of fine-grained, per-sample fusion in capturing the mutual reinforcement and complementarity of multi-view graph signals. The method’s flexible, differentiable construction further allows direct end-to-end optimization with contrastive learning objectives, significantly advancing state-of-the-art performance in multi-view clustering settings (Zhu et al., 8 Nov 2025).

PDF Markdown Chat (Pro)

References (2)

MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering (2025)

Graph Sparsification via Mixture of Graphs (2024)

Follow Topic

Get notified by email when new papers are published related to Mixture of Ego-Graphs Fusion (MoEGF).