Graph Condensation Approaches

Updated 15 December 2025

Graph condensation is the process of synthesizing a small, information-rich proxy graph that preserves the key properties of the original data, enabling efficient GNN training.
It employs diverse algorithmic paradigms such as gradient, distribution, trajectory, and spectral matching to achieve robust, scalable, and domain-adaptive graph representations.
Applications include hyperparameter tuning, neural architecture search, continual and federated learning, with experiments showing condensed graphs can retain over 98% accuracy with less than 1% of the original nodes.

Graph condensation is the process of synthesizing a small, information-rich proxy graph from a large original graph, enabling efficient and repeated training of Graph Neural Networks (GNNs) at a fraction of the computational and memory footprint. The central objective is to produce a synthetic graph on which a GNN achieves nearly the same predictive performance as on the original, with applications ranging from hyperparameter tuning and neural architecture search to continual and federated learning. This article presents a rigorous survey of the theoretical foundations, algorithmic paradigms, methodological innovations, and recent extensions of graph condensation, synthesizing the state of the art from single-label, multi-label, homogeneous, and heterogeneous settings.

1. Formal Definitions and Theoretical Frameworks

Graph condensation is formally defined as the following optimization problem: for an original graph $\mathcal{G}=(A,X,Y)$ with adjacency $A$ , features $X\in\mathbb{R}^{N\times d}$ , and labels $Y$ , synthesize a compact surrogate $\mathcal{S}=(A',X',Y')$ , $N'\ll N$ , such that for any GNN architecture $\Psi_\theta$ , the test performance on $\mathcal{G}$ of a model trained on $\mathcal{S}$ is close to that trained on $\mathcal{G}$ itself. Mathematically, this involves a bi-level minimization: $\min_{A',X',Y'} \; L(\Psi^S(A,X),Y) \quad \text{with} \quad \Psi^S = \arg\min_\theta L(\Psi_\theta(A',X'),Y')$ where $L$ is typically a supervised task loss (e.g., cross-entropy for node classification) (Liu et al., 2022).

Achieving faithful condensation is formally challenging due to model dependence and optimization complexity. Recent work (Yan et al., 18 Sep 2025) generalizes the condensation objective to: $\min_{\tilde{\mathcal{G}}} \; \Delta_Z(Z, \tilde Z) + \xi \Delta_Y(G, \tilde G)$ where $Z$ , $\tilde Z$ represent generic node or graph representations, and $\Delta_Z$ , $\Delta_Y$ denote discrepancies in representation and semantic or label spaces, respectively. Virtually all prior algorithms—gradient matching, receptive-field distribution matching, eigenbasis matching, trajectory matching—are specific instances of this general framework.

A key theoretical insight is that matching higher-order statistics (e.g., receptive field distributions, spectral subspaces) or training trajectories ensures fidelity across downstream models and tasks, not merely for the relay GNN used during condensation (Liu et al., 2022, Fu et al., 23 Dec 2024).

2. Algorithmic Paradigms and Metric Alignment Approaches

State-of-the-art graph condensation algorithms can be categorized by the principal alignment metric and optimization process.

Gradient Matching: Methods such as GCond [GCond, (Liu et al., 2022); GCondenser, (2405.14246)] optimize $A',X'$ so that the gradient of the GNN loss on $\mathcal{S}$ closely matches that on $\mathcal{G}$ over several training steps:

$\mathcal{L}_{\rm match} = D\left(\nabla_\theta L(\mathcal{G};\theta), \nabla_\theta L(\mathcal{S};\theta)\right)$

High-fidelity but computationally expensive due to repeated GNN training.

Distribution Matching: GCDM (Liu et al., 2022) proposes matching receptive-field output distributions by minimizing maximum mean discrepancy (MMD) between class-conditional embeddings of local neighborhoods:

$\operatorname{MMD}^2(P^L, P'^L) = \mathbb{E}_{x,x'\sim P^L}[k(x,x')] + \mathbb{E}_{y,y'\sim P'^L}[k(y,y')] - 2\mathbb{E}_{x\sim P^L, y\sim P'^L}[k(x,y)]$

where $P^L$ and $P'^L$ are empirical distributions of $L$ -hop receptive fields.

Trajectory Matching: SFGC (Zheng et al., 2023) and GCSR (Liu et al., 12 Mar 2024) match the parameter trajectories of a GNN on $\mathcal{G}$ and $\mathcal{S}$ , enforcing that models trained for multiple steps from the same initialization result in similar final weights.
Spectral/Eigenbasis Matching: Condensation can also be achieved by aligning the leading eigenvectors of the graph Laplacian or feature covariance matrices, ensuring spectral information is preserved across scales (Fu et al., 23 Dec 2024).
Self-expressive Structure Learning: GCSR (Liu et al., 12 Mar 2024) reconstructs the adjacency of the synthetic graph via a self-expression property:

$\min_{C} \|X'^T - X'^T C\|_F^2 + \alpha \|C - P\|_F^2 + \beta\|C - C_h\|_F^2$

where $C$ is a learnable coefficient matrix and $P$ carries class-wise structural statistics.

Clustering and Partition-Based Condensation: CGC (Gao et al., 22 May 2024) and GECC (Gong et al., 24 Feb 2025) reduce condensation to class-wise clustering in feature or propagated embedding space, followed by centroid selection and optional structure generation.

3. Efficient, Training-Free, and Scalable Strategies

Addressing the computational bottlenecks of bi-level optimization and large-scale graphs, recent approaches have introduced training-free and modular methodologies.

Disentangled Condensation (DisCo): DisCo (Xiao et al., 18 Jan 2024) separates node condensation (feature/label selection with MLPs and anchor constraints) from edge generation (link prediction using learned neighborhood aggregation), replacing bi-level joint optimization with two independent, memory-efficient modules. This enables condensation on graphs with over 100 million nodes.
Class Partition and Clustering Approaches: CGC (Gao et al., 22 May 2024) transforms the matching problem into class-to-node clustering, providing a closed-form update for condensed features and eliminating gradient-based iterations. GECC (Gong et al., 24 Feb 2025) further extends this idea with evolving clustering and incremental $k$ -means++ for dynamic or growing graphs.
Simple Graph Condensation (SimGC): SimGC (Xiao et al., 22 Mar 2024) forgoes iterated GNN training altogether, matching the mean and variance of layer-wise SGC features between the original and condensed graphs, with all condensation variables being condensed graph parameters.
Gaussian Process Condensation (GCGP): GCGP (Wang et al., 5 Jan 2025) replaces surrogate GNN training with the closed-form posterior of a GP defined on the condensed graph, employing a structure-aware covariance and a Concrete relaxation for end-to-end differentiable edge learning.
Tensor Decomposition Approaches: GCTD (Santos et al., 20 Aug 2025) uses a multi-view (randomly perturbed) adjacency tensor and non-negative RESCAL decomposition to discover latent clusters and synthesize synthetic graphs, supporting interpretability and efficient mapping between original and condensed nodes without GNN training in the loop.
Self-Expressive and Data-Selection Methods for Heterogeneous Graphs: Methods such as FreeHGC (Liang et al., 20 Dec 2024) and HGC-Herd (Ou et al., 8 Dec 2025) use meta-path-based feature propagation, submodular coverage/diversity maximization, and class-wise herding to provide efficient, training-free condensation for richly typed graphs.

4. Domain Extensions: Inductive Inference, Multi-Label and Heterogeneous Graphs

Graph condensation frameworks have been generalized for more demanding scenarios:

Inductive Condensation with Node Mapping: MCond (Gao et al., 2023) introduces a learnable, sparse node mapping from original to condensed graphs, enabling full inductive inference on $G'$ alone, even in the absence of the original large graph at test time. The mapping is trained via reconstruction and embedding losses that ensure new nodes are faithfully embedded via the synthetic graph, yielding orders-of-magnitude speedups in inference and large reductions in memory requirement.
Multi-Label Condensation: Multi-label settings, with one node associated with multiple labels, necessitate improved synthetic dataset initialization and loss functions. The benchmark (Zhang et al., 23 Dec 2024) shows that K-Center initialization, together with probabilistic label assignment and binary cross-entropy loss, preserves distributional diversity and matches multi-label statistics. Empirically, GCond + K-Center + BCE achieves state-of-the-art F1-micro scores across diverse large-scale multi-label graph datasets.
Heterogeneous Graphs: FreeHGC (Liang et al., 20 Dec 2024) reformulates condensation as data-selection via meta-path coverage and diversity maximization, with greedy submodular optimization offering a $(1-1/e)$ approximation. HGC-Herd (Ou et al., 8 Dec 2025) applies lightweight feature propagation along meta-paths and class-wise herding for prototype selection, ensuring balanced, semantically faithful coresets suitable for training Heterogeneous GNNs.

5. Robustness, Generalization, and Condensation under Distribution Shift

Robustness to noise and structural corruption, as well as adaptation to dynamic or evolving graphs, are current frontiers in graph condensation:

Noise-Robust Condensation: RobGC (Gao et al., 19 Jun 2024) injects a label-propagation-based denoising loop into any existing condensation pipeline, optimizing the edge reliability matrix and refining both the original and condensed graphs. This joint condensation-denoising strategy ensures that the synthetic graph encodes only the “clean” signal, substantially improving resilience to random or adversarial structural noise.
Open-World and Evolving Graph Condensation: OpenGC (Gao et al., 27 May 2024) models time-dependent embeddings and residual shifts to synthesize “future” environments representative of open-world evolution, with invariant risk minimization (IRM) ensuring robustness to distributional drift. GECC (Gong et al., 24 Feb 2025) and BiMSGC (Fu et al., 23 Dec 2024) further address dynamic scalability and multi-scale requirements: GECC provides efficient incremental condensation by retaining and updating cluster centroids, and BiMSGC introduces an information-bottleneck-based meso-scale condensation strategy with bi-directional learning, simultaneously supporting small-scale and large-scale synthetic graphs without degradation or collapse.
Generalization Across Architectures and Tasks: Distribution-matching objectives such as the MMD-based GCDM (Liu et al., 2022) and task-agnostic pre-training via optimal transport and hybrid-interval graph diffusion (Yan et al., 18 Sep 2025) yield condensed graphs that can be used for arbitrary downstream tasks (including node regression and unsupervised clustering) and across a wide range of GNN architectures, eliminating the brittle “overfitting” to a single relay model observed in older bi-level pipelines.

6. Implementation, Empirical Evaluation, and Benchmarking

Recent benchmarks such as GCondenser (2405.14246) and others provide unified frameworks for systematic algorithm comparison, ablation, and validation. Key empirical findings include:

Condensed graphs with just $<1\%$ of the original nodes can retain $>98\%$ of the full-graph accuracy.
Structure-free variants (e.g., SFGC (Zheng et al., 2023), CGC-X (Gao et al., 22 May 2024)) often match or exceed the structured methods in accuracy, with massive speed-ups.
Lightweight algorithms (e.g., SimGC (Xiao et al., 22 Mar 2024), GECC (Gong et al., 24 Feb 2025), GCTD (Santos et al., 20 Aug 2025), CGC (Gao et al., 22 May 2024)) provide orders-of-magnitude runtime and memory reduction, enabling condensation of million-node graphs in seconds.
Condensed graphs retain cross-architecture generalization, supporting MLP, GCN, SAGE, GIN, APPNP, etc., with negligible drop in performance.
Multi-label, bipartite, and dynamic graphs are now supported by specialized and benchmarked condensation workflows (Zhang et al., 23 Dec 2024, Liang et al., 23 Nov 2025).

Paradigm	Example Methods	Advantages	Limitations
Gradient Matching	GCond, DosCond, SGDD	High fidelity	High cost, slow
Distribution Match	GCDM, SimGC, GCSR	Cross-architecture	May lose some grads
Trajectory Match	SFGC, GEOM	Alignment to expert	Storage, preprocessing
Training-free	SimGC, DisCo, FreeHGC, GECC	Scalable, fast	May ignore fine topology
Tensor Decomp	GCTD	Interpretability	Label-agnostic in core
Inductive Map	MCond, PreGC	Inductive inference	Mapping cost

7. Open Problems, Limitations, and Research Directions

Despite significant advances, several open issues and avenues remain:

Large-Scale Label Propagation and User-defined Tasks: Efficient condensation under partial supervision, federated, and privacy-preserving settings remains underexplored.
Extremely Low Condensation Ratios: Ensuring class balance and representativeness below $<0.1\%$ , especially for rare labels or classes, is nontrivial (Liang et al., 20 Dec 2024).
Heterophilic and Non-Euclidean Graphs: Current self-expressiveness and feature-propagation strategies may underperform in graphs where structure and node features are weakly correlated (Liu et al., 12 Mar 2024).
Condensation under Concept Drift and Open-set Recognition: While OpenGC and PreGC lay the groundwork for dynamism, fully online condensation and adaptation to rapid distributional shifts require new optimization and evaluation strategies (Gao et al., 27 May 2024, Yan et al., 18 Sep 2025).
Hybrid Condensation Methods: Combining clustering-based, spectral, and distributional inversion with explicit structure generation and label harmonization promises further scalability and generalizability.

In conclusion, graph condensation has evolved into a mature and versatile field encompassing model-based and training-free approaches; supporting static, dynamic, homogeneous, and heterogeneous settings; and offering theoretical guarantees, empirical scalability, and robust performance across tasks and architectures (Liu et al., 2022, Liu et al., 12 Mar 2024, Xiao et al., 18 Jan 2024, Gong et al., 24 Feb 2025, Ou et al., 8 Dec 2025).