Condensation-Concatenation Continual Learning (CCC)
- The paper introduces CCC, which condenses dynamic graph snapshots and selectively concatenates node embeddings to efficiently mitigate catastrophic forgetting.
- It employs a clustering-based condensation method that preserves semantic labels and topological patterns using KL divergence and adjacency reconstruction error minimization.
- Selective concatenation fuses historical context only in regions affected by recent topological changes, optimizing memory use and maintaining high classification accuracy.
Condensation-Concatenation-based Continual Learning (CCC) is a framework for continual learning in dynamic graph streams, targeting the challenges of catastrophic forgetting in Graph Neural Networks (GNNs) applied to temporally-evolving relational data. CCC integrates two core principles: condensation of each historical graph snapshot into highly compact summaries that retain semantic and topological properties, and selective concatenation of historical and current node representations restricted to regions of the graph influenced by recent topological changes. This approach achieves efficient memory use and mitigates the accuracy degradation arising from continual updates in non-stationary graph environments (Yan et al., 12 Dec 2025).
1. Motivation and Problem Setting
Dynamic graphs, as encountered in citation networks, social media, or transaction systems, exhibit continuous structural drift through node/edge additions, removals, and changing labels or features. Standard GNNs, when trained or fine-tuned sequentially on such evolving data, are vulnerable to catastrophic forgetting: performance on previously observed graph structures rapidly degrades after adaptation to recent changes.
Let denote a sequence of graph snapshots, where each . The core challenge addressed by CCC is the preservation of knowledge accrued from historical snapshots without resorting to storage of full previous graphs, and simultaneously enabling quick adaptation to current structural shifts in . The framework fulfills two primary goals: (a) semantic and topological preservation of historical summaries for efficient replay, and (b) targeted fusion of historical context with current node representations localized to subgraphs most affected by new perturbations.
2. Condensation: Compacting Graph Memory
The condensation module is responsible for transforming each historical snapshot into a significantly smaller condensed graph , such that . The condensed graph is required to preserve (i) the label distribution and (ii) key topological patterns (e.g., community structure) of the original graph.
2.1 Objective and Algorithm
For class label distribution in and condensed distribution , label preservation is enforced via KL divergence. Topological fidelity is enforced by minimizing adjacency matrix reconstruction error via Frobenius norm. The condensation objective is: where are trade-off hyperparameters.
The condensation process is clustering-based, partitioning nodes into class-balanced clusters, extracting a small representative subset from each via k-means, and constructing a new adjacency by thresholding cosine similarities to mirror the original average degree.
3. Concatenation: Selective Fusion of Representations
After condensation, previous graphs are processed by a dynamic GNN (e.g., EvolveGCN) to yield historical node embeddings for the current node set. When the current snapshot arrives, node representations are computed.
However, structural changes are rarely global. CCC defines an "influence region" as the -hop locality around endpoints of changed edges: where are endpoints of modifications, and is hop distance.
Node embeddings are fused by concatenation: with if , $0$ otherwise. Optionally, a learnable gate may control the influence of the historical embedding. The selection ensures that only regions impacted by changes access historical context, reducing noise in the learning signal.
4. Forgetting Measure and Learning Objective
Traditional forgetting metrics inadequately reflect knowledge loss caused by structural drift in dynamic graphs. CCC introduces a refined Forgetting Measure (FM) to quantify the drop in accuracy for nodes previously classified correctly upon the arrival of new graph data:
The node classification objective at each time is
and the total CCC objective is:
5. Empirical Evaluation
CCC is evaluated on four large-scale dynamic graph benchmarks:
- Arxiv: citation network (≈169K nodes, 1.2M edges, 40 classes, 128 features)
- Paper100M: citation stream (≈100M nodes, 1B edges, 10 classes, 128 features)
- DBLP: co-authorship over 20 years (≈500K nodes, 2M edges, 5 classes, 64 features)
- Elliptic: Bitcoin transactions (≈200K nodes, 234K edges, 2 classes, 94 features)
Each dataset is split into temporal tasks, with models compared on Performance Mean (PM) and Forgetting Measure (FM).
| Method | Arxiv PM | Arxiv FM | Paper100M PM | Paper100M FM | DBLP PM | DBLP FM | Elliptic PM | Elliptic FM |
|---|---|---|---|---|---|---|---|---|
| TWP | 77.59±0.51% | 4.07±0.10% | 73.91±0.18% | 6.84±0.04% | 49.44±2.81% | 7.06±1.57% | 98.14±0.07% | 0.19±0.01% |
| ContinualGNN | 43.48±0.74% | 31.48±0.10% | 36.50±0.44% | 28.92±2.54% | 60.69±0.13% | 7.90±3.26% | 93.12±0.30% | 1.35±0.34% |
| CCC | 77.67±0.06% | 2.90±0.16% | 74.10±0.50% | 5.65±0.09% | 54.26±0.72% | 4.90±0.27% | 97.74±0.07% | 0.12±0.00% |
CCC matches or surpasses established baselines in PM and consistently yields the lowest FM, empirically validating its design at both scale and across heterogeneous graph domains.
6. Ablation Studies
Ablation experiments on Arxiv isolate the contributions of CCC's submodules:
| Variant | PM | FM |
|---|---|---|
| CCC w/o Condensation () | 75.12% | 4.55% |
| CCC w/o Concatenation () | 76.05% | 3.90% |
| CCC w/o Selective Region (all concat) | 77.20% | 3.50% |
| Full CCC | 77.67% | 2.90% |
Ablation results underscore that condensation is critical to limiting forgetting, while concatenation of historical embeddings further reduces accuracy degradation. Applying historical concatenation indiscriminately (all nodes) introduces noise, slightly degrading both PM and FM. This suggests the selective replay mechanism is essential for maximizing benefit from historical summaries.
7. Strengths, Limitations, and Future Extensions
CCC's primary strengths lie in its compact memory footprint, avoidance of full historical graph storage, and targeted replay which limits unnecessary noise. Its operation is agnostic to task boundaries, adapting naturally to streaming updates.
Limitations include sensitivity to condensation hyperparameters (), the similarity threshold , and region size , which require careful tuning. Additional overhead is required for precomputing dynamic GNN embeddings over condensed graphs.
Potential future directions include adaptive gradient-based condensation, attention-based instead of binary fusion gates, and extension to heterogeneous graphs with type-specific condensation processes.
In summary, Condensation-Concatenation-based Continual Learning (CCC) presents an efficient approach for mitigating catastrophic forgetting in dynamic-graph GNNs by marrying effective condensation of temporal data with selective, region-aware replay, achieving state-of-the-art results in both sustained accuracy and minimization of forgetting across diverse streaming graph benchmarks (Yan et al., 12 Dec 2025).