Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks (2308.16800v3)
Abstract: Our study reveals new theoretical insights into over-smoothing and feature over-correlation in graph neural networks. Specifically, we demonstrate that with increased depth, node representations become dominated by a low-dimensional subspace that depends on the aggregation function but not on the feature transformations. For all aggregation functions, the rank of the node representations collapses, resulting in over-smoothing for particular aggregation functions. Our study emphasizes the importance for future research to focus on rank collapse rather than over-smoothing. Guided by our theory, we propose a sum of Kronecker products as a beneficial property that provably prevents over-smoothing, over-correlation, and rank collapse. We empirically demonstrate the shortcomings of existing models in fitting target functions of node classification tasks.
- Predict then propagate: Graph neural networks meet personalized pagerank. In International Conference on Learning Representations, 2018.
- Equibind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning, pages 20503–20521. PMLR, 2022.
- Graph-based representation learning for web-scale recommender systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 4784–4785, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393850. doi: 10.1145/3534678.3542598.
- Forecasting unobserved node states with spatio-temporal graph neural networks. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW), pages 740–747. IEEE, 2022a.
- Two sides of the same coin: Heterophily and oversmoothing in graph convolutional neural networks. arXiv preprint arXiv:2102.06462, 2021.
- Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations, 2020.
- Revisiting graph neural networks: All we have is low-pass filters. arXiv preprint arXiv:1905.09550, 2019.
- Feature overcorrelation in deep graph neural networks: A new perspective. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 709–719, 2022.
- Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI conference on artificial intelligence, 2018.
- Nicolas Keriven. Not too little, not too much: a theoretical analysis of graph (over) smoothing. arXiv preprint arXiv:2205.12156, 2022.
- Graph-coupled oscillator networks. In International Conference on Machine Learning, pages 18888–18909. PMLR, 2022a.
- A note on over-smoothing for graph neural networks. arXiv preprint arXiv:2006.13318, 2020.
- Dirichlet energy constrained learning for deep graph neural networks. Advances in Neural Information Processing Systems, 34:21834–21846, 2021.
- Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
- Graph attention networks. In International Conference on Learning Representations, 2018.
- Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3438–3445, 2020a.
- Graph neural networks as gradient flows. arXiv preprint arXiv:2206.10991, 2022.
- A fractional graph laplacian approach to oversmoothing. arXiv preprint arXiv:2305.13084, 2023.
- Gradient gating for deep multi-rate learning on graphs. arXiv preprint arXiv:2210.00513, 2022b.
- A survey on oversmoothing in graph neural networks. arXiv preprint arXiv:2303.10993, 2023.
- Demystifying oversmoothing in attention-based graph neural networks. arXiv preprint arXiv:2305.16102, 2023.
- Dropedge: Towards deep graph convolutional networks on node classification. In International Conference on Learning Representations, 2020.
- Pairnorm: Tackling oversmoothing in gnns. In International Conference on Learning Representations, 2020.
- Simple and deep graph convolutional networks. In International Conference on Machine Learning, pages 1725–1735. PMLR, 2020b.
- Transforming pagerank into an infinite-depth graph neural network. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part II, pages 469–484. Springer, 2023.
- Contranorm: A contrastive learning perspective on oversmoothing and beyond. In The Eleventh International Conference on Learning Representations, 2023.
- Representation degeneration problem in training natural language generation models. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
- Understanding dimensional collapse in contrastive self-supervised learning. In 10th International Conference on Learning Representations, ICLR 2022, 2022.
- Orthogonal graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 3996–4004, 2022.
- Enhancing graph representations learning with decorrelated propagation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1466–1476, 2023.
- Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
- Automating the construction of internet portals with machine learning. Information Retrieval, 3:127–163, 2000.
- Towards deeper graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 338–348, 2020.
- Transforming pagerank into an infinite-depth graph neural network. arXiv preprint arXiv:2207.00684, 2022b.
- Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
- Adaptive universal generalized pagerank graph neural network. In International Conference on Learning Representations, 2020.
- Beyond low-frequency information in graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3950–3957, 2021.
- Neural sheaf diffusion: A topological perspective on heterophily and oversmoothing in gnns. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
- Gilbert Strang. Linear algebra and its applications. Belmont, CA: Thomson, Brooks/Cole, 2006.
- How attentive are graph attention networks? arXiv preprint arXiv:2105.14491, 2021.
- Oskar Perron. Zur theorie der matrices. Mathematische Annalen, 64(2):248–263, 1907.
- Robert G Gallager. Discrete stochastic processes. Journal of the Operational Research Society, 48(1):103–103, 1997.
- Attention is not all you need: Pure attention loses rank doubly exponentially with depth. In International Conference on Machine Learning, pages 2793–2803. PMLR, 2021.
- Signal propagation in transformers: Theoretical perspectives and the role of rank collapse. Advances in Neural Information Processing Systems, 35:27198–27211, 2022.
- Approximation with Kronecker products. Springer, 1993.
- Sum of kronecker products representation and its cholesky factorization for spatial covariance matrices from large grids. Computational Statistics & Data Analysis, 157:107165, 2021.
- Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, 1989. doi: 10.1162/neco.1989.1.4.541.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Residual gated graph convnets. arXiv preprint arXiv:1711.07553, 2017.
- Training graph neural networks with 1000 layers. In International conference on machine learning, pages 6437–6449. PMLR, 2021.
- Some might say all you need is sum. arXiv preprint arXiv:2302.11603, 2023.
- Edge directionality improves learning on heterophilic graphs. arXiv preprint arXiv:2305.10498, 2023.
- Revisiting semi-supervised learning with graph embeddings. In International conference on machine learning, pages 40–48. PMLR, 2016.
- Geom-gcn: Geometric graph convolutional networks. In International Conference on Learning Representations, 2020.
- Multi-scale attributed node embedding. Journal of Complex Networks, 9(2):cnab014, 2021.
- Social influence analysis in large-scale networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 807–816, 2009.
- Masked label prediction: Unified message passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509, 2020.
- Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.