Federated Temporal Graph Clustering

Updated 13 April 2026

Federated Temporal Graph Clustering (FTGC) is a decentralized method that clusters dynamic graph data by combining temporal aggregation with federated learning while preserving data privacy.
FTGC employs graph neural networks for spatial feature extraction and a temporal window mechanism to capture evolving graph structures in client data.
The framework utilizes federated averaging with parameter sparsification and quantization to ensure efficient communication and high clustering performance, as evidenced on datasets like DBLP and School.

Federated Temporal Graph Clustering (FTGC) is a decentralized approach for clustering dynamic graph data distributed across multiple clients, each holding a private sequence of temporal graph snapshots. FTGC addresses the challenges of temporal graph clustering under data privacy constraints, enabling collaborative discovery of evolving graph structures without centralizing raw data. The framework uses graph neural networks (GNNs) with a specialized temporal aggregation mechanism and a federated learning protocol, balancing clustering fidelity, temporal smoothness, communication efficiency, and privacy preservation (Zhou et al., 2024).

1. Formalization and Clustering Objective

FTGC operates over $K$ clients, each storing a temporal graph sequence:

$\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$

with $V_t^{(k)}$ the node set, $E_t^{(k)}$ the edge set, and $X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ node features at time $t$ . The clustering task is to compute, for each $t$ , a soft assignment $F_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times C}$ into $C$ clusters, co-clustering nodes with strong temporal and spatial connectivity and ensuring cluster assignment smoothness over time.

The global objective aggregates local clustering losses, subject to model consistency enforced via federated aggregation:

$\min_{\{\theta_k\}_{k=1}^K} \frac{1}{K}\sum_{k=1}^K \mathcal{L}_k(\theta_k; \Gamma_k),$

where $\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 0 are client-local parameters. The core clustering subproblem per client $\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 1 is

$\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 2

where $\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 3 is the graph Laplacian and $\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 4 regulates temporal smoothness.

2. Temporal Aggregation and Embedding Construction

Each client computes node embeddings $\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 5 by integrating spatial and temporal information. Spatial aggregation uses a graph convolutional approach:

$\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 6

where $\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 7 and $\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 8 is a nonlinear activation. Temporal aggregation leverages a temporal window of size $\Gamma_k = \bigl\{\,G_t^{(k)}=(V_t^{(k)},E_t^{(k)},X_t^{(k)})\bigr\}_{t=1}^T,$ 9:

$V_t^{(k)}$ 0

with learnable attention weights $V_t^{(k)}$ 1 (softmax-normalized) and per-offset matrices $V_t^{(k)}$ 2. The final temporal-spatial node embedding is

$V_t^{(k)}$ 3

This mechanism captures both local graph structure and its temporal evolution, enabling the model to learn temporally coherent cluster representations.

3. Federated Optimization and Training Process

FTGC employs a federated averaging (FedAvg) protocol augmented with model update compression for scalable and communication-efficient training. The training proceeds over $V_t^{(k)}$ 4 rounds:

Server broadcasts global model $V_t^{(k)}$ 5.
Each client (in parallel):
- Receives $V_t^{(k)}$ 6, initializes $V_t^{(k)}$ 7.
- Performs $V_t^{(k)}$ 8 local epochs: computes temporal embeddings $V_t^{(k)}$ 9 for all $E_t^{(k)}$ 0, evaluates and optimizes local loss $E_t^{(k)}$ 1.
- Computes update $E_t^{(k)}$ 2, sparsifies to top $E_t^{(k)}$ 3 entries ( $E_t^{(k)}$ 4), quantizes ( $E_t^{(k)}$ 5), and uploads $E_t^{(k)}$ 6.
Server aggregates updates:

$E_t^{(k)}$ 7

Raw graph data $E_t^{(k)}$ 8 and features $E_t^{(k)}$ 9 remain on client devices, ensuring privacy at all stages.

4. Loss Function and Regularization

The per-client loss optimized during local training is composed of a clustering term and a temporal smoothness regularizer:

$X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 0

Optionally, an $X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 1 penalty may be applied to $X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 2:

$X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 3

Global optimization minimizes the average total loss $X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 4 via the federated loop.

5. Experimental Protocol and Performance

Experiments are conducted on a range of real-world temporal graph datasets partitioned across $X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 5 clients:

DBLP (co-author network)
Brain (functional connectivity)
Patent (citation network)
School (contact network)

Key experimental hyperparameters include temporal window $X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 6, cluster count $X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 7 (dataset-dependent), local epochs $X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 8, rounds $X_t^{(k)} \in \mathbb{R}^{|V_t^{(k)}|\times d}$ 9, learning rate $t$ 0, and compression sparsity $t$ 1. Evaluation metrics encompass Clustering Accuracy (ACC), Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and F1-score (F1).

Dataset	ACC [%]	NMI [%]	ARI [%]	F1 [%]
DBLP	49.50	38.00	23.50	46.00
Brain	45.00	51.00	31.00	45.00
Patent	51.00	26.00	19.50	39.50
School	99.80	99.50	99.40	99.80

FTGC (with $t$ 2 clients) consistently matches or outperforms centralized methods such as TGC and TREND, without centralizing raw data.

6. Communication Efficiency and Privacy Protection

Communication overhead is minimized via:

Transmission of only parameter deltas ( $t$ 3) instead of full model weights
Sparsification to transmit only the top $t$ 4 of gradient entries (e.g., $t$ 5)
Quantization $t$ 6 to 8/16-bit precision

Clients perform multiple local updates before transmitting, reducing synchronization frequency. Data privacy is maintained since neither graph structures $t$ 7 nor node features $t$ 8 are uploaded. Additional protections, such as secure aggregation or differential privacy noise addition to $t$ 9, can further enhance privacy properties as needed.

FTGC establishes a robust framework for federated clustering of dynamic graphs, balancing synchronization efficiency, privacy, and clustering quality in a decentralized setting (Zhou et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Federated Temporal Graph Clustering (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Federated Temporal Graph Clustering (FTGC).