Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spectral Graph Pruning Against Over-Squashing and Over-Smoothing (2404.04612v2)

Published 6 Apr 2024 in cs.LG, eess.SP, and stat.ML

Abstract: Message Passing Graph Neural Networks are known to suffer from two problems that are sometimes believed to be diametrically opposed: over-squashing and over-smoothing. The former results from topological bottlenecks that hamper the information flow from distant nodes and are mitigated by spectral gap maximization, primarily, by means of edge additions. However, such additions often promote over-smoothing that renders nodes of different classes less distinguishable. Inspired by the Braess phenomenon, we argue that deleting edges can address over-squashing and over-smoothing simultaneously. This insight explains how edge deletions can improve generalization, thus connecting spectral gap optimization to a seemingly disconnected objective of reducing computational resources by pruning graphs for lottery tickets. To this end, we propose a more effective spectral gap optimization framework to add or delete edges and demonstrate its effectiveness on large heterophilic datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Condensing Temporal Networks using Propagation, pp. 417–425. 2017. doi: 10.1137/1.9781611974973.47. URL https://epubs.siam.org/doi/abs/10.1137/1.9781611974973.47.
  2. On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=i80OPhOCVH2.
  3. Diffwire: Inductive graph rewiring via the lovász bound, 2022. URL https://arxiv.org/abs/2206.07369.
  4. Half-hop: A graph upsampling approach for slowing down message passing, 08 2023.
  5. Oversquashing in gnns through the lens of information contraction and graph expansion. In 2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp.  1–8. IEEE Press, 2022. doi: 10.1109/Allerton49937.2022.9929363. URL https://doi.org/10.1109/Allerton49937.2022.9929363.
  6. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
  7. Equivariant subgraph aggregation networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=dFbKQaRk15w.
  8. Understanding oversquashing in gnns through the lens of effective resistance. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
  9. Weisfeiler and lehman go cellular: CW networks. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021a. URL https://openreview.net/forum?id=uVPZCMVtsSG.
  10. Weisfeiler and lehman go topological: Message passing simplicial networks. In ICLR 2021 Workshop on Geometrical and Topological Representation Learning, 2021b. URL https://openreview.net/forum?id=RZgbB-O3w6Z.
  11. Adversarial attacks on node embeddings via graph poisoning. In Proceedings of the 36th International Conference on Machine Learning, ICML, Proceedings of Machine Learning Research. PMLR, 2019.
  12. Biognn: How graph neural networks can solve biological problems. In Artificial Intelligence and Machine Learning for Healthcare, pp.  211–231. Springer, 2023.
  13. Braess, D. Über ein paradoxon aus der verkehrsplanung. Unternehmensforschung, 12:258–268, 1968.
  14. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR, abs/2104.13478, 2021. URL https://arxiv.org/abs/2104.13478.
  15. Burkholz, R. Convolutional and residual networks provably contain lottery tickets. In International Conference on Machine Learning, 2022a.
  16. Burkholz, R. Most activation functions can win the lottery without excessive depth. In Advances in Neural Information Processing Systems, 2022b.
  17. On the existence of universal lottery tickets. In International Conference on Learning Representations, 2022.
  18. The electrical resistance of a graph captures its commute and cover times. computational complexity, 6(4):312–340, 1996. doi: 10.1007/BF01270385. URL https://doi.org/10.1007/BF01270385.
  19. Cheeger, J. A lower bound for the smallest eigenvalue of the laplacian. 1969.
  20. The lottery ticket hypothesis for pre-trained bert networks. In Advances in Neural Information Processing Systems, volume 33, pp.  15834–15846. 2020.
  21. A unified lottery ticket hypothesis for graph neural networks. In International Conference on Machine Learning, 2021.
  22. Braess’s paradox in large sparse graphs. In Internet and Network Economics. Springer Berlin Heidelberg, 2010.
  23. Braess’s paradox in expanders. Random Structures & Algorithms, 41(4):451–468, 2012. doi: https://doi.org/10.1002/rsa.20457. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/rsa.20457.
  24. Expander graph propagation. In The First Learning on Graphs Conference, 2022. URL https://openreview.net/forum?id=IKevTLt3rT.
  25. Long range graph benchmark, 2023.
  26. Provable and practical approximations for the degree distribution using sublinear graph samples. In Proceedings of the 2018 World Wide Web Conference, WWW ’18, pp.  449–458, Republic and Canton of Geneva, CHE, 2018. International World Wide Web Conferences Steering Committee. ISBN 9781450356398. doi: 10.1145/3178876.3186111. URL https://doi.org/10.1145/3178876.3186111.
  27. Braess’s paradox for the spectral gap in random graphs and delocalization of eigenvectors. Random Structures & Algorithms, 50, 2017.
  28. A general framework for proving the equivariant strong lottery ticket hypothesis, 2022.
  29. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl-b3RcF7.
  30. Pruning neural networks at initialization: Why are we missing the mark? In International Conference on Learning Representations, 2021.
  31. Masks, signs, and learning rate rewinding. In International Conference on Learning Representations, 2024.
  32. Why random pruning is all we need to start sparse. In International Conference on Machine Learning, 2023.
  33. Diffusion improves graph learning. In Conference on Neural Information Processing Systems (NeurIPS), 2019.
  34. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pp.  1263–1272. JMLR.org, 2017.
  35. On over-squashing in message passing neural networks: The impact of width, depth, and topology, 2023.
  36. On the trade-off between over-smoothing and over-squashing in deep graph neural networks. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, pp.  566–576, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701245. doi: 10.1145/3583780.3614997. URL https://doi.org/10.1145/3583780.3614997.
  37. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, pp.  729–734 vol. 2, 2005. doi: 10.1109/IJCNN.2005.1555942.
  38. Gray, R. M. Toeplitz and Circulant Matrices: A Review. Foundations and Trends in Communications and Information Theory, 2(3):155–239, 2005. doi: 10.1561/0100000006.
  39. DRew: Dynamically rewired message passing with delay. In International Conference on Machine Learning, pp. 12252–12267. PMLR, 2023.
  40. Hamilton, R. The ricci flow on surfaces. In Mathematics and general relativity, Proceedings of the AMS-IMS-SIAM Joint Summer Research Conference in the Mathematical Sciences on Mathematics in General Relativity, 1988.
  41. Inductive Representation Learning on Large Graphs. In NIPS, pp.  1024–1034, 2017.
  42. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, 2016. doi: 10.1109/CVPR.2016.90.
  43. REVISITING PRUNING AT INITIALIZATION THROUGH THE LENS OF RAMANUJAN GRAPH. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=uVcDssQff_.
  44. Rethinking graph lottery tickets: Graph sparsity matters. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=fjh7UGQgOB.
  45. FoSR: First-order spectral rewiring for addressing oversquashing in GNNs. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=3YjQfCLdrzz.
  46. Keriven, N. Not too little, not too much: a theoretical analysis of graph (over)smoothing. In The First Learning on Graphs Conference, 2022. URL https://openreview.net/forum?id=KQNsbAmJEug.
  47. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR, 2017.
  48. ARPACK Users’ Guide: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. SIAM, 1998.
  49. Leman, A. The reduction of a graph to canonical form and the algebra which appears therein. 1968.
  50. Deepgcns: Can gcns go as deep as cnns? In The IEEE International Conference on Computer Vision (ICCV), 2019.
  51. Sgcn: A graph sparsifier based on graph convolutional networks. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part I, pp.  275–287, Berlin, Heidelberg, 2020. Springer-Verlag. ISBN 978-3-030-47425-6. doi: 10.1007/978-3-030-47426-3_22. URL https://doi.org/10.1007/978-3-030-47426-3_22.
  52. Automating the construction of internet portals with machine learning. Information Retrieval, 3(2):127–163, 2000. doi: 10.1023/A:1009953814988. URL https://doi.org/10.1023/A:1009953814988.
  53. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers. In Advances in Neural Information Processing Systems, pp. 4932–4942. 2019.
  54. Weisfeiler and leman go neural: Higher-order graph neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):4602–4609, Jul. 2019. doi: 10.1609/aaai.v33i01.33014602. URL https://ojs.aaai.org/index.php/AAAI/article/view/4384.
  55. Are GATs out of balance? In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=qY7UqLoora.
  56. Query-driven active surveying for collective classification. 2012.
  57. Revisiting over-smoothing and over-squashing using ollivier-ricci curvature, 2023.
  58. Revisiting graph neural networks: All we have is low-pass filters. ArXiv, abs/1905.09550, 2019.
  59. Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations, 2020.
  60. A study on the ramanujan graph property of winning lottery tickets. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  17186–17201. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/pal22a.html.
  61. A critical look at evaluation of gnns under heterophily: Are we really making progress? In The Eleventh International Conference on Learning Representations, 2023.
  62. Graph neural networks for materials science and chemistry. Communications Materials, 3(1):93, 2022. doi: 10.1038/s43246-022-00315-6. URL https://doi.org/10.1038/s43246-022-00315-6.
  63. A survey on oversmoothing in graph neural networks, 2023.
  64. Salez, J. Sparse expanders have negative curvature, 2021.
  65. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009. doi: 10.1109/TNN.2008.2005605.
  66. Collective classification in network data. AI Magazine, 29(3):93, Sep. 2008. doi: 10.1609/aimag.v29i3.2157. URL https://ojs.aaai.org/index.php/aimagazine/article/view/2157.
  67. Graph neural networks in particle physics. Machine Learning: Science and Technology, 2(2):021001, jan 2021. doi: 10.1088/2632-2153/abbf9a. URL https://doi.org/10.1088/2632-2153/abbf9a.
  68. Matrix Perturbation Theory. Computer Science and Scientific Computing. Elsevier Science, 1990. ISBN 9780126702309. URL https://books.google.de/books?id=l78PAQAAMAAJ.
  69. Understanding over-squashing and bottlenecks on graphs via curvature. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=7UmjRGzp-A.
  70. Where did the gap go? reassessing the long-range graph benchmark, 2023.
  71. Graph Attention Networks. In ICLR, 2018.
  72. von Luxburg, U. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007. doi: 10.1007/s11222-007-9033-z. URL https://doi.org/10.1007/s11222-007-9033-z.
  73. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
  74. Robust graph representation learning via neural sparsification. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  11458–11468. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/zheng20d.html.
  75. Dirichlet energy constrained learning for deep graph neural networks. Advances in neural information processing systems, 2021.

Summary

  • The paper presents a novel spectral graph pruning method that balances edge addition and deletion to mitigate both over-squashing and over-smoothing.
  • It proposes two efficient graph rewiring algorithms—ProxyDelete and ProxyAdd—based on matrix perturbation theory to optimize the spectral gap.
  • Empirical results show improved semi-supervised node and graph classification, highlighting the approach’s potential to reduce computational costs and enhance GNN performance.

Spectral Graph Pruning Enhances GNN Performance by Tackling Over-Squashing and Over-Smoothing

Introduction to Spectral Graph Pruning

Graph Neural Networks (GNNs), particularly Message Passing GNNs, are crucial in leveraging graph-structured data across various domains. Despite their effectiveness, GNNs often face two critical challenges: over-squashing and over-smoothing. Over-squashing curtails the flow of information from distant nodes due to topological bottlenecks, while over-smoothing causes node features from different classes to become indistinguishable after several rounds of aggregation. This work innovatively addresses both issues by employing spectral graph pruning, inspired by the Braess phenomenon, to simultaneously mitigate over-squashing and over-smoothing.

Spectral Gap Optimization Framework

The crux of our approach lies in optimizing the spectral gap, which is tightly connected to the graph's connectivity and the effectiveness of information dissemination across the graph. The spectral gap is defined as the first non-zero eigenvalue of the normalized graph Laplacian and serves as a measure to quantify over-squashing. We demonstrate that, contrary to common practices that add edges to increase the spectral gap, deleting edges can also achieve this goal effectively. This insight is seminal in establishing a spectral gap optimization framework capable of both adding and deleting edges, depending on the requirements of the specific GNN application.

Graph Rewiring Strategies

The paper introduces two graph rewiring algorithms—ProxyDelete and ProxyAdd—that leverage matrix perturbation theory for computationally efficient spectral gap optimization. ProxyDelete identifies and removes edges that inhibit efficient message passing or contribute to over-smoothing, thereby improving the graph's learning capacity. On the other hand, ProxyAdd focuses on adding edges that enhance connectivity without significantly increasing over-smoothing risks. These strategies demonstrate superior performance in optimizing the spectral gap and, thus, in mitigating over-squashing and over-smoothing, especially in heterophilic graph settings.

Empirical Evaluation and Results

Empirical evaluations conducted on various heterophilic and homophilic datasets establish the effectiveness of our proposed framework. The experiments showcase considerable improvements in semi-supervised node classification and graph classification tasks. Notably, spectral gap-based edge deletions successfully identify graph lottery tickets (sparse sub-networks) that match or even surpass the performance of denser networks, underlining the potential of our approach in reducing computational costs.

Implications and Future Directions

This research pioneers the use of spectral graph pruning to combat over-squashing and over-smoothing simultaneously, marking a significant advancement in graph learning methodologies. It sheds light on the utility of the Braess phenomenon in graph optimization problems, opening new avenues for research in graph neural network design and optimization. Future work might explore the integration of feature and label information into the spectral graph pruning process for more tailored graph rewiring and further investigate the implications of this approach on large-scale graph processing and learning tasks.

In conclusion, spectral graph pruning emerges as a promising tool for enhancing GNN performance, offering a new perspective on addressing some of the most pressing challenges in graph neural network research.