Condensation-Concatenation Continual Learning (CCC)

Updated 19 December 2025

The paper introduces CCC, which condenses dynamic graph snapshots and selectively concatenates node embeddings to efficiently mitigate catastrophic forgetting.
It employs a clustering-based condensation method that preserves semantic labels and topological patterns using KL divergence and adjacency reconstruction error minimization.
Selective concatenation fuses historical context only in regions affected by recent topological changes, optimizing memory use and maintaining high classification accuracy.

Condensation-Concatenation-based Continual Learning (CCC) is a framework for continual learning in dynamic graph streams, targeting the challenges of catastrophic forgetting in Graph Neural Networks (GNNs) applied to temporally-evolving relational data. CCC integrates two core principles: condensation of each historical graph snapshot into highly compact summaries that retain semantic and topological properties, and selective concatenation of historical and current node representations restricted to regions of the graph influenced by recent topological changes. This approach achieves efficient memory use and mitigates the accuracy degradation arising from continual updates in non-stationary graph environments (Yan et al., 12 Dec 2025).

1. Motivation and Problem Setting

Dynamic graphs, as encountered in citation networks, social media, or transaction systems, exhibit continuous structural drift through node/edge additions, removals, and changing labels or features. Standard GNNs, when trained or fine-tuned sequentially on such evolving data, are vulnerable to catastrophic forgetting: performance on previously observed graph structures rapidly degrades after adaptation to recent changes.

Let $\{\mathcal{G}^{(1)},\mathcal{G}^{(2)},\dots,\mathcal{G}^{(T)}\}$ denote a sequence of graph snapshots, where each $\mathcal{G}^{(t)}=(\mathcal{V}^{(t)},\mathcal{E}^{(t)},X^{(t)})$ . The core challenge addressed by CCC is the preservation of knowledge accrued from historical snapshots without resorting to storage of full previous graphs, and simultaneously enabling quick adaptation to current structural shifts in $\mathcal{G}^{(t+1)}$ . The framework fulfills two primary goals: (a) semantic and topological preservation of historical summaries for efficient replay, and (b) targeted fusion of historical context with current node representations localized to subgraphs most affected by new perturbations.

2. Condensation: Compacting Graph Memory

The condensation module is responsible for transforming each historical snapshot $\mathcal{G}^{(t)}$ into a significantly smaller condensed graph $G_C^{(t)}=(V_C^{(t)},E_C^{(t)},X_C^{(t)})$ , such that $|V_C^{(t)}|\ll|\mathcal{V}^{(t)}|$ . The condensed graph is required to preserve (i) the label distribution and (ii) key topological patterns (e.g., community structure) of the original graph.

2.1 Objective and Algorithm

For class label distribution $\mathbf{p}^{(t)}$ in $\mathcal{G}^{(t)}$ and condensed distribution $\mathbf{q}^{(t)}$ , label preservation is enforced via KL divergence. Topological fidelity is enforced by minimizing adjacency matrix reconstruction error via Frobenius norm. The condensation objective is: $\mathcal{L}_{\mathrm{cond}}^{(t)} = \alpha\,\mathrm{KL}\bigl(\mathbf{p}^{(t)}\Vert \mathbf{q}^{(t)}\bigr) + \beta\,\bigl\|\mathbf{A}^{(t)} - \widetilde{\mathbf{A}^{(t)}}\bigr\|_F^2 \tag{1}$ where $\alpha, \beta$ are trade-off hyperparameters.

The condensation process is clustering-based, partitioning nodes into class-balanced clusters, extracting a small representative subset from each via k-means, and constructing a new adjacency by thresholding cosine similarities to mirror the original average degree.

3. Concatenation: Selective Fusion of Representations

After condensation, previous graphs are processed by a dynamic GNN (e.g., EvolveGCN) to yield historical node embeddings $H_{\mathrm{hist}} \in \mathbb{R}^{N^{(t)}\times d_h}$ for the current node set. When the current snapshot $\mathcal{G}^{(t)}$ arrives, node representations $H_{\mathrm{cur}}$ are computed.

However, structural changes are rarely global. CCC defines an "influence region" $\mathcal{R}_{\mathrm{chg}}$ as the $k$ -hop locality around endpoints of changed edges: $\mathcal{R}_{\mathrm{chg}} = \bigcup_{v\in\mathcal{S}} \{u\in\mathcal{V}^{(t)} \mid d_{G^{(t)}}(u,v)\le k\} \tag{2}$ where $\mathcal{S}$ are endpoints of modifications, and $d(\cdot,\cdot)$ is hop distance.

Node embeddings are fused by concatenation: $\widetilde{H}_i = \bigl[\,H_{\mathrm{cur},i}\,\bigr\Vert\, m_i\cdot H_{\mathrm{hist},i}\bigr] \tag{3}$ with $m_i=1$ if $i\in\mathcal{R}_{\mathrm{chg}}$ , $0$ otherwise. Optionally, a learnable gate may control the influence of the historical embedding. The selection ensures that only regions impacted by changes access historical context, reducing noise in the learning signal.

4. Forgetting Measure and Learning Objective

Traditional forgetting metrics inadequately reflect knowledge loss caused by structural drift in dynamic graphs. CCC introduces a refined Forgetting Measure (FM) to quantify the drop in accuracy for nodes previously classified correctly upon the arrival of new graph data: $\mathrm{FM}^{(t)} = \frac{\left|\{v\mid \hat y^{(t)}(v)=y(v),\, \hat y^{(t+1)}(v)\ne y(v)\}\right|}{\left|\{v\mid \hat y^{(t)}(v)=y(v)\}\right|} \tag{4}$

The node classification objective at each time $t$ is

$\mathcal{L}_{\mathrm{cls}}^{(t)} = \frac{1}{|\mathcal{V}^{(t)}|} \sum_{i\in\mathcal{V}^{(t)}} \ell(\mathrm{softmax}(W_s\widetilde{H}_i),\,y_i) \tag{5}$

and the total CCC objective is: $\min_{\theta, \{G_C^{(t)}\}} \sum_{t=1}^T \bigl[\mathcal{L}_{\mathrm{cls}}^{(t)} + \lambda\mathcal{L}_{\mathrm{cond}}^{(t)}\bigr] \tag{6}$

5. Empirical Evaluation

CCC is evaluated on four large-scale dynamic graph benchmarks:

Arxiv: citation network (≈169K nodes, 1.2M edges, 40 classes, 128 features)
Paper100M: citation stream (≈100M nodes, 1B edges, 10 classes, 128 features)
DBLP: co-authorship over 20 years (≈500K nodes, 2M edges, 5 classes, 64 features)
Elliptic: Bitcoin transactions (≈200K nodes, 234K edges, 2 classes, 94 features)

Each dataset is split into $T=10$ temporal tasks, with models compared on Performance Mean (PM) and Forgetting Measure (FM).

Method	Arxiv PM	Arxiv FM	Paper100M PM	Paper100M FM	DBLP PM	DBLP FM	Elliptic PM	Elliptic FM
TWP	77.59±0.51%	4.07±0.10%	73.91±0.18%	6.84±0.04%	49.44±2.81%	7.06±1.57%	98.14±0.07%	0.19±0.01%
ContinualGNN	43.48±0.74%	31.48±0.10%	36.50±0.44%	28.92±2.54%	60.69±0.13%	7.90±3.26%	93.12±0.30%	1.35±0.34%
CCC	77.67±0.06%	2.90±0.16%	74.10±0.50%	5.65±0.09%	54.26±0.72%	4.90±0.27%	97.74±0.07%	0.12±0.00%

CCC matches or surpasses established baselines in PM and consistently yields the lowest FM, empirically validating its design at both scale and across heterogeneous graph domains.

6. Ablation Studies

Ablation experiments on Arxiv isolate the contributions of CCC's submodules:

Variant	PM	FM
CCC w/o Condensation ( $\alpha=0$ )	75.12%	4.55%
CCC w/o Concatenation ( $m_i\equiv0$ )	76.05%	3.90%
CCC w/o Selective Region (all concat)	77.20%	3.50%
Full CCC	77.67%	2.90%

Ablation results underscore that condensation is critical to limiting forgetting, while concatenation of historical embeddings further reduces accuracy degradation. Applying historical concatenation indiscriminately (all nodes) introduces noise, slightly degrading both PM and FM. This suggests the selective replay mechanism is essential for maximizing benefit from historical summaries.

7. Strengths, Limitations, and Future Extensions

CCC's primary strengths lie in its compact memory footprint, avoidance of full historical graph storage, and targeted replay which limits unnecessary noise. Its operation is agnostic to task boundaries, adapting naturally to streaming updates.

Limitations include sensitivity to condensation hyperparameters ( $\alpha,\beta$ ), the similarity threshold $\theta$ , and region size $k$ , which require careful tuning. Additional overhead is required for precomputing dynamic GNN embeddings over condensed graphs.

Potential future directions include adaptive gradient-based condensation, attention-based instead of binary fusion gates, and extension to heterogeneous graphs with type-specific condensation processes.

In summary, Condensation-Concatenation-based Continual Learning (CCC) presents an efficient approach for mitigating catastrophic forgetting in dynamic-graph GNNs by marrying effective condensation of temporal data with selective, region-aware replay, achieving state-of-the-art results in both sustained accuracy and minimization of forgetting across diverse streaming graph benchmarks (Yan et al., 12 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Condensation-Concatenation Framework for Dynamic Graph Continual Learning (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Condensation-Concatenation-based Continual Learning (CCC).