Graph Winning Ticket (GWT)

Updated 26 February 2026

Graph Winning Ticket (GWT) is a concept that defines a pair of sparse subgraph and subnetwork extracted from dense GNNs to retain or improve accuracy while reducing computational cost.
GWT methodologies employ iterative, one-shot, and structure-driven pruning strategies that jointly optimize graph connectivity and network weights for enhanced efficiency.
Empirical studies demonstrate that well-designed GWTs achieve significant edge and weight sparsity, leading to faster computations and competitive or superior performance across various graph tasks.

A Graph Winning Ticket (GWT) is a structured notion in graph neural network (GNN) research designating a pair comprising a sparse subgraph and a sparse subnetwork—both derived from a dense GNN and its underlying graph—which, when retrained from the same initialization, recovers or even exceeds the accuracy of the original dense model at significantly lower computational cost. Originating from the extension of the Lottery Ticket Hypothesis (LTH) to graph domains, GWTs have motivated a spectrum of frameworks for jointly pruning graphs and GNN weights, theoretical studies of expressivity, and practical methods—ranging from iterative magnitude pruning to one-shot and structural schemes—for efficient subgraph selection.

1. Formal Definitions and Conceptual Basis

The formalization of a GWT tightly couples graph and model sparsification. Let $\mathcal{G} = (\mathcal{V},\mathcal{E})$ with node features $\mathbf{X}$ and adjacency $\mathbf{A} \in \{0,1\}^{|\mathcal{V}| \times |\mathcal{V}|}$ . For a GNN $f(\{\mathbf{A},\mathbf{X}\};\Theta)$ with weights $\Theta$ and corresponding initial values $\Theta_0$ , introduce binary masks:

$\mathbf{M}_g \in \{0,1\}^{|\mathcal{V}| \times |\mathcal{V}|}$ (graph/edge mask)
$\mathbf{M}_\theta \in \{0,1\}^{|\Theta|}$ (weight mask)

A GWT is the pair $(\mathcal{G}_{\mathrm{sub}},f_{\mathrm{sub}})$ where $\mathcal{G}_{\mathrm{sub}} = (\mathbf{M}_g \odot \mathbf{A}, \mathbf{X})$ and $f_{\mathrm{sub}} = f(\cdot; \Theta_0 \odot \mathbf{M}_\theta)$ such that retraining on this structure matches or exceeds the parent model’s test accuracy: $\varphi\big(f_{\mathrm{sub}}(\mathcal{G}_{\mathrm{sub}}; \Theta_0)\big) \geq \varphi\big(f(\mathcal{G}; \Theta_0)\big)$ (Chen et al., 2021, Hui et al., 2023, Yue et al., 2024). For some settings, GWT also encompasses tickets found without any weight training—so-called strong lottery tickets—where only the mask is learned over fixed random weights (Yan et al., 2023).

2. Extraction Methodologies: Iterative, One-Shot, and Structural Approaches

Iterative & Joint Sparsification

The Unified GNN Sparsification (UGS) framework realizes joint pruning of $\mathbf{A}$ and $\Theta$ via differentiable soft masks and iterative magnitude pruning (IMP), followed by rewinding to $\Theta_0$ and retraining. Edge and weight masks are regularly thresholded to meet target sparsity ( $s_g, s_\theta$ ), and the process continues until desired reductions are achieved (Chen et al., 2021): $\mathcal{L}_\mathrm{UGS} = \mathcal{L}(\mathbf{M}_g \odot \mathbf{A}, \mathbf{X}; \mathbf{M}_\theta \odot \Theta) + \gamma_1 \|\mathbf{M}_g\|_1 + \gamma_2 \|\mathbf{M}_\theta\|_1$ This iterative paradigm, widely adopted, forms the base for most state-of-the-art GWT search procedures (Hui et al., 2023, Wang et al., 2023, Yue et al., 2024).

Enhanced Pruning: Adversarial and Auxiliary-Guided

Rethinking the sensitivity of GNNs to graph sparsification, (Hui et al., 2023) augments IMP with a min–max optimization over the graph mask, treating edge pruning as an adversarial perturbation, alongside an auxiliary Wasserstein-based loss for improved edge-selection robustness. The ACE (Adversarial Complementary Erasing) framework (Wang et al., 2023) further exploits information in pruned components by dynamically exchanging potentially valuable pruned elements back into the ticket via Gumbel-Max sampling, empirically achieving higher achievable sparsity.

Fast-Track One-Shot Pruning

Recent work (Yue et al., 2024) demonstrates the viability of a one-shot soft mask approach (i.e., direct thresholding after a single round of sparsification), followed by a gradual denoising phase that incrementally swaps low-utility kept elements for high-potential pruned ones, guided by structural and gradient signals. This hybrid achieves higher sparsity and an order-of-magnitude speedup over classical IMP, with similar accuracy retention.

Structure-Driven Algorithms

Model-agnostic, structure-driven methods—including kTree/1Tree union-of-random-spanning-trees (Tsitsulin et al., 2023) and degree-discriminative edge pruning (TEDDY) (Seo et al., 2024)—rapidly construct sparse, connected subgraphs that empirically serve as GWTs for a variety of downstream tasks. These techniques leverage global connectivity, spectral properties, or edge-degree statistics to produce sparse backbones resilient to the choice of graph learner.

Winning Tickets via Pre-specified Topologies

For adaptive spatial-temporal GNNs (ASTGNNs), (Duan et al., 2024) demonstrates star-topology graphs as analytic GWTs. Training directly upon star spanning trees yields prediction accuracy on par with dense models but with linear ( $\mathcal{O}(N)$ ) computational complexity—contrasting the quadratic $\mathcal{O}(N^2)$ scaling of adaptive complete-graph frameworks.

3. Theoretical Underpinnings: Expressivity and Existence

The expressivity of pruned subnetworks is shown to be a decisive factor for the existence and effectiveness of GWTs. (Kummer et al., 4 Jun 2025) establishes that, for sufficiently wide moment-based GNNs, it is possible to prune a large fraction of weights in each layer and yet preserve 1-WL expressivity (i.e., the ability to distinguish non-isomorphic graphs via the Weisfeiler–Leman test): $\forall G,H \in D,\, G \not\simeq_\mathrm{WL} H \implies \widehat{\Phi}(G) \neq \widehat{\Phi}(H)$ The Strong Expressive Lottery Ticket Hypothesis (SELTH) is formulated:

There exist sparse initializations of GNNs, with expressivity matching full networks, yielding faster convergence (via increased gradient diversity) and superior generalization.

For structural tickets, theoretical results connect properties of sparse backbone graphs (edge expansion, spectral sparsity, algebraic connectivity) to robust downstream task performance (Tsitsulin et al., 2023, Duan et al., 2024). In ASTGNNs, star-topology GWTs are motivated by the spectral approximation: $\frac{1}{N} L_\mathrm{star} \preceq L_\mathrm{complete} \preceq N L_\mathrm{star}$ guaranteeing preservation of global smoothing and mixing characteristics.

4. Empirical Results and Practical Efficacy

Comprehensive evaluations (Chen et al., 2021, Hui et al., 2023, Yue et al., 2024, Seo et al., 2024, Wang et al., 2023) reveal that well-constructed GWTs can yield:

Model/Task	Edge Sparsity (%)	Weight Sparsity (%)	MAC Savings (%)	Accuracy Δ (pp)
GCN/Cora	58	97	98	+0.2
ResGCN/OGB-Arxiv	50	70	85	–0.1
TEDDY/GAT/Pubmed	87	—	80	+3.1

FastGLT (Yue et al., 2024): 1.7–44× pruning speedup, 45.6% more model sparsity, and 22.7% more graph sparsity than iterative IMP baselines.
ACE-GLT (Wang et al., 2023): up to +10pp graph and +47pp model sparsity without accuracy loss, outperforming prior state-of-the-art iterative methods.
TEDDY (Seo et al., 2024): consistently matches or outperforms IMP for both small and large graphs, using purely untrained, degree-driven edge scoring.

Empirical findings also stress that GNN accuracy is typically more sensitive to edge deletion than weight pruning, especially beyond 40–50% edge sparsity; carefully-designed transfer learning protocols reveal that weight masks from GWTs can transfer across graphs and tasks with competitive performance (Hui et al., 2023).

5. Model-Agnostic and Task-Specific Winning Tickets

Random spanning tree-based methods (Tsitsulin et al., 2023) demonstrate that for a wide class of graph learning algorithms—including community detection (Louvain), embedding (DeepWalk), and 2-layer GCNs—sparse subgraphs with average degrees as low as 2–5 per node suffice for performance parity with the full graph. For ASTGNNs, star-topology GWTs (Duan et al., 2024) perform state-of-the-art on traffic forecasting at maximal sparsity.

Supra-architectural methods such as Multi-Stage Folding and Multicoated Supermasks enable the construction of GWTs over randomly initialized weights, achieving memory reductions of up to 98.7% on deep GNNs without accuracy degradation (Yan et al., 2023).

6. Limitations, Open Directions, and Practical Usage

Limitations are principally tied to the potential for performance collapse at extreme sparsities—especially in edge-pruning—and to the overheads of multi-round mask retraining unless one-shot strategies are adopted (Hui et al., 2023, Yue et al., 2024). Further challenges persist in scaling to heterogeneous, multi-task, or dynamic graphs and in jointly optimizing for additional computational primitives (e.g., quantization, low-rankization).

Best practices for GWT discovery include:

Conservative edge pruning per round ( $p_g \leq 5\%$ ), aggressive weight pruning ( $p_\theta \approx 20\%$ ), and the use of adversarial or auxiliary loss formulations for edge selection (Hui et al., 2023).
Pre-processing arbitrary large graphs with kTree and related algorithms makes large-scale graph learning substantially more efficient (Tsitsulin et al., 2023).
For federated or transfer learning applications, one-shot tickets and mask denoising offer robust transferability and local-to-global aggregation (Yue et al., 2024).

Future avenues involve theoretical characterization of the relationship between initial expressivity and ticket retrainability, extension to online/dynamic graph domains, automatic intermediary sparsity scheduling, and deeper integration of topology priors in the model design. The spectrum of GWT research collectively supports the assertion that every graph and associated GNN hosts highly sparse, structure- and initialization-dependent subnetworks which can be efficiently isolated and exploited for scalable, robust, and generalizable learning.