High-Degree Preserving Graph Pruning
- High-degree preserving graph pruning is a method that retains critical hubs by explicitly preserving node degree distributions, unlike naïve thresholding.
- Techniques such as the Marginal Likelihood Filter (MLF) and Global Likelihood Filter (GLF) assess edge significance to maintain overall network structure.
- Adaptive pruning in privacy-preserving FHE GNNs reduces computational overhead while ensuring that vital network features and accuracy remain intact.
High-degree preserving graph pruning consists of a set of principled methodologies for reducing complex or noisy graphs to more informative subgraphs while explicitly preserving the degree (weighted or unweighted) distributions of the underlying nodes. This approach contrasts with naïve thresholding, which tends to fragment the graph and disproportionately penalize nodes with low degree, and is applicable to both empirical integer-weighted networks and privacy-preserving GNNs operating in encrypted domains. The central principle is the retention and prioritization of nodes and edges according to significance scores or degree statistics, ensuring that structural hubs maintain their connectivity and that large-scale network features remain intact. Prominent realizations include the Marginal Likelihood Filter (MLF) and Global Likelihood Filter (GLF) for empirical networks, as well as encrypted degree-based pruning in privacy-preserving graph inference (Dianati, 2015, &&&1&&&).
1. Null Models and Degree Statistics
High-degree preserving pruning begins with rigorous modeling of node degrees (or strengths). For an undirected graph with integer edge-weights , the node strength is and total event weight (Dianati, 2015). In encrypted graph settings, the encrypted adjacency yields encrypted row-sums or vector form for node importance scoring (Zhao et al., 8 Jul 2025).
Both frameworks employ null models where unit-edges are assigned randomly, ensuring that average node strength is maintained and degree distributions are preserved in expectation. In the configuration-style null model, the probability that exactly unit-edges fall between and is , with .
2. Marginal Likelihood Filter (MLF): Edge Significance and Pruning
MLF evaluates the statistical significance of individual edges via their deviation from the null model distribution (Dianati, 2015). For edge , the -value is
where . Edges are sorted by ascending significance , and retained if for a chosen threshold (e.g., ) or by keeping a top fraction .
This filter preserves the degree sequence in expectation and avoids isolation of low-degree nodes. Empirical network analyses demonstrate that MLF yields sparser yet globally connected subgraphs, recovers regionally faithful layouts (as in air traffic networks), and disentangles overlapping clusters better than strict weight thresholding (Dianati, 2015).
3. Global Likelihood Filter (GLF) and Correlated Pruning
GLF generalizes significance filtering to the entire subgraph, formulating the likelihood of observing the full weighted graph under the null ensemble:
Here, Lagrange multipliers are set to match empirical strengths , and the global pruning criterion seeks a -edge subgraph that minimizes . Monte Carlo search (e.g., Metropolis–Hastings) is used to efficiently traverse the combinatorial space, accounting for edge correlations (Dianati, 2015).
In practical evaluations, the Jaccard similarity between MLF and GLF retained-edge sets exceeds 80% even under severe pruning, indicating that MLF approximates the globally optimal selection for many networks.
4. Encrypted High-Degree Preserving Pruning in FHE GNNs
Within privacy-preserving GNN inference under fully homomorphic encryption (FHE), high-degree preserving pruning is engineered to minimize redundancy and computational overhead while maintaining accuracy (Zhao et al., 8 Jul 2025). The pipeline entails:
- Calculation of encrypted degree statistics using CKKS primitives.
- Partitioning nodes into groups using descending thresholds via approximate comparison (), generating multi-level importance masks .
- Logical pruning: Keep mask zeroes out low-importance nodes in features and adjacency.
- Adaptive activation: Assignment of polynomial degree approximations in nonlinearities, with higher-degree polynomials reserved for nodes with high preserved degree.
Homomorphic implementations use only additions and a small number of multiplications and rotations per node, keeping efficacy high within FHE constraints.
5. Empirical Validation and Trade-offs
Extensive benchmarking of MLF and GLF on the US airport network and occupation co-occurrence datasets highlights key structural trade-offs (Dianati, 2015):
- At fraction of edges kept near $0.5$, MLF retains of nodes in the largest component; thresholding fragments the graph.
- MLF subgraphs unfold coherent geographic or semantic structures, while thresholded graphs remain dense and tangled.
- Clique numbers and clustering coefficients decrease with pruning, but connectivity is preserved for hubs.
In DESIGN (Zhao et al., 8 Jul 2025), encrypted GNNs benefit from similar trade-offs:
- Pruning up to $40$– of nodes reduces inference latency by over at a cost of only $1$– accuracy loss (Cora dataset).
- Adaptive polynomial selection yields an additional $20$– savings in homomorphic multiplications.
- Full pipeline (pruning + adaptive activation) yields $2.0$– speedup over SEAL with accuracy competitive to optimized FHE GNNs.
Representative empirical results:
| Prune (%) | Accuracy (%) | Latency (s) (Cora, DESIGN) |
|---|---|---|
| 10 | 76.7 | 93.5 |
| 40 | 74.2 | 70.0 |
| 70 | 65.7 | 48.9 |
| 90 | 47.1 | 37.4 |
A plausible implication is that high-degree preserving approaches maintain the integrity of global network features while enabling significant computational reductions, in both plaintext and encrypted settings.
6. Algorithmic Complexities and Practical Guidelines
MLF has time complexity , dominated by edge sorting; individual binomial tests are or approximable for large . GLF's complexity is per Monte Carlo swap, efficient when factorials are pre-tabulated.
DESIGN’s FHE pipeline leverages only CKKS-compatible primitives (HE.Add, HE.Mult, HE.Rotate), suitable for real-time inference in privacy-sensitive domains (Zhao et al., 8 Jul 2025).
Guidelines include tuning thresholds to maintain the giant component above $80$– for connectivity, and using clustering measures to select appropriate pruning points. For encrypted graphs, threshold selection and adaptive scheme configuration directly impact inference efficiency and accuracy.
7. Comparison to Thresholding and Degree-Sequence Effects
A fundamental distinction is that both MLF and GLF explicitly preserve degree sequences (), ensuring hubs remain central and avoiding systemic bias against low-degree nodes (Dianati, 2015). Naïve weight thresholding, however, discards edges indiscriminately below cutoff, disconnecting low-strength nodes and often producing fragmented or misleading topologies.
In encrypted GNNs, partitioning by encrypted degree ensures that important nodes are never pruned solely due to uniform strategies, maintaining fidelity with underlying graph semantics (Zhao et al., 8 Jul 2025). This suggests that degree-preserving pruning is preferable for most real-world analytic and predictive tasks where global connectivity and hub dynamics are essential.