Demystifying Oversmoothing in Attention-Based Graph Neural Networks (2305.16102v4)
Abstract: Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers. In particular, our analysis accounts for asymmetric, state-dependent and time-varying aggregation operators and a wide range of common nonlinear activation functions, such as ReLU, LeakyReLU, GELU and SiLU.
- Interaction networks for learning about objects, relations and physics. In NeurIPS, 2016.
- Convergence in multiagent coordination, consensus, and flocking. Proceedings of the 44th IEEE Conference on Decision and Control, pages 2996–3000, 2005.
- How attentive are graph attention networks? In ICLR, 2022.
- Spectral networks and locally connected networks on graphs. In ICLR, 2014.
- A note on over-smoothing for graph neural networks. In ICML Graph Representation Learning and Beyond (GRL+) Workshop, 2020.
- Simple and deep graph convolutional networks. In ICML, 2020.
- Sets of matrices all infinite products of which converge. Linear Algebra and its Applications, 161:227–263, 1992.
- Convolutional neural networks on graphs with fast localized spectral filtering. In NeurIPS, 2016.
- Convolutional networks on graphs for learning molecular fingerprints. In NeurIPS, 2015.
- Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- Graph attention retrospective. ArXiv, abs/2202.13060, 2022.
- Neural message passing for quantum chemistry. In ICML, 2017.
- A new model for learning in graph domains. In IJCNN, 2005.
- Darald J. Hartfiel. Nonhomogeneous Matrix Products. 2002.
- Bayesian graph neural networks with adaptive connection sampling. In ICML, 2020.
- Deep residual learning for image recognition. In CVPR, 2016.
- Raphaël M. Jungers. The Joint Spectral Radius: Theory and Applications. 2009.
- Nicolas Keriven. Not too little, not too much: a theoretical analysis of graph (over)smoothing. In NeurIPS, 2022.
- Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
- Predict then propagate: Graph neural networks meet personalized pagerank. In ICLR, 2019.
- Diffusion improves graph learning. In Neural Information Processing Systems, 2019.
- Peter D. Lax. Functional Analysis. 2002.
- Markov Chains and Mixing Times. 2008.
- Deeper insights into graph convolutional networks for semi-supervised learning. In AAAI, 2018.
- Scattering gcn: Overcoming oversmoothness in graph convolutional networks. In NeurIPS, 2020.
- Graph neural networks exponentially lose expressive power for node classification. In ICLR, 2020.
- Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- Geom-gcn: Geometric graph convolutional networks. In ICLR, 2020.
- Dropedge: Towards deep graph convolutional networks on node classification. In ICLR, 2020.
- A note on the joint spectral radius. 1960.
- A survey on oversmoothing in graph neural networks. ArXiv, abs/2303.10993, 2023.
- The graph neural network model. IEEE Transactions on Neural Networks, 20:61–80, 2009.
- Eugene Seneta. Non-negative Matrices and Markov Chains. 2008.
- Revisiting over-smoothing in bert from the perspective of graph. In ICLR, 2022.
- Jacques Theys. Joint spectral radius: theory and approximations. Ph. D. dissertation, 2005.
- The Lyapunov exponent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute and to approximate. Mathematics of Control, Signals and Systems, 10:31–40, 1997.
- Attention is all you need. In NeurIPS, 2017.
- Graph attention networks. In ICLR, 2018.
- Improving graph attention networks with large margin-based constraints. ArXiv, abs/1910.11945, 2019.
- Graph neural networks in recommender systems: A survey. ACM Computing Surveys, 55:1 – 37, 2020.
- A non-asymptotic analysis of oversmoothing in graph neural networks. In ICLR, 2023.
- A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32:4–24, 2019.
- Representation learning on graphs with jumping knowledge networks. In ICML, 2018.
- Revisiting semi-supervised learning with graph embeddings. In ICML, 2016.
- Graph convolutional policy network for goal-directed molecular graph generation. In NeurIPS, 2018.
- Graphsaint: Graph sampling based inductive learning method. In ICLR, 2020.
- Pairnorm: Tackling oversmoothing in gnns. In ICLR, 2020.
- Xinyi Wu (47 papers)
- Amir Ajorlou (11 papers)
- Zihui Wu (19 papers)
- Ali Jadbabaie (143 papers)