Cross-Space Adaptive Filter: Integrating Graph Topology and Node Attributes for Alleviating the Over-smoothing Problem (2401.14876v2)
Abstract: The vanilla Graph Convolutional Network (GCN) uses a low-pass filter to extract low-frequency signals from graph topology, which may lead to the over-smoothing problem when GCN goes deep. To this end, various methods have been proposed to create an adaptive filter by incorporating an extra filter (e.g., a high-pass filter) extracted from the graph topology. However, these methods heavily rely on topological information and ignore the node attribute space, which severely sacrifices the expressive power of the deep GCNs, especially when dealing with disassortative graphs. In this paper, we propose a cross-space adaptive filter, called CSF, to produce the adaptive-frequency information extracted from both the topology and attribute spaces. Specifically, we first derive a tailored attribute-based high-pass filter that can be interpreted theoretically as a minimizer for semi-supervised kernel ridge regression. Then, we cast the topology-based low-pass filter as a Mercer's kernel within the context of GCNs. This serves as a foundation for combining it with the attribute-based filter to capture the adaptive-frequency information. Finally, we derive the cross-space filter via an effective multiple-kernel learning strategy, which unifies the attribute-based high-pass filter and the topology-based low-pass filter. This helps to address the over-smoothing problem while maintaining effectiveness. Extensive experiments demonstrate that CSF not only successfully alleviates the over-smoothing problem but also promotes the effectiveness of the node classification task.
- Analyzing the Expressive Power of Graph Neural Networks in a Spectral Perspective. In ICLR.
- Beyond Low-frequency Information in Graph Convolutional Networks. In AAAI, Vol. 35. 3950–3957.
- Random feature expansions for deep Gaussian processes. In International Conference on Machine Learning. PMLR, 884–893.
- Methods for the combination of kernel matrices within a support vector framework. Machine learning 78, 1 (2010), 137–174.
- Combining kernel information for support vector classification. In International Workshop on Multiple Classifier Systems. Springer, 102–111.
- AdaGNN: Graph Neural Networks with Adaptive Frequency Response Filter. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 392–401.
- Edge direction and the structure of networks. Proceedings of the National Academy of Sciences 107, 24 (2010), 10815–10820.
- Shrinkage estimation. Springer.
- The elements of statistical learning. Vol. 1. Springer series in statistics New York.
- p𝑝pitalic_p-Laplacian Based Graph Neural Networks. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 6878–6917. https://proceedings.mlr.press/v162/fu22e.html
- Yasuhiro Fujiwara and Go Irie. 2014. Efficient Label Propagation. In ICML (Proceedings of Machine Learning Research, Vol. 32). PMLR, Bejing, China, 784–792.
- Message Passing in Graph Convolution Networks via Adaptive Filter Banks. arXiv preprint arXiv:2106.09910 (2021).
- Predict then Propagate: Graph Neural Networks meet Personalized PageRank. In ICLR.
- Mehmet Gönen and Ethem Alpaydın. 2011. Multiple kernel learning algorithms. The Journal of Machine Learning Research 12 (2011), 2211–2268.
- Inductive Representation Learning on Large Graphs. In NIPS.
- Li He and Hong Zhang. 2018. Kernel K-means sampling for Nyström approximation. IEEE Transactions on Image Processing 27, 5 (2018), 2108–2120.
- Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks?—A Neural Tangent Kernel Perspective. Advances in neural information processing systems 33 (2020), 2698–2709.
- Node similarity preserving graph convolutional networks. In WSDM. 148–156.
- Adaptive Kernel Graph Neural Network. AAAI (2022).
- JuryGCN: Quantifying Jackknife Uncertainty on Graph Convolutional Networks. In KDD. 742–752.
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.
- Semi-supervised learning with max-margin graph cuts. In AISTATS. JMLR Workshop and Conference Proceedings, 421–428.
- G2CN: Graph Gaussian Convolution Networks with Concentrated Graph Filters. In ICML. PMLR, 12782–12796.
- Label efficient semi-supervised learning via graph filtering. In CVPR. 9582–9591.
- Confidence May Cheat: Self-Training on Graph Neural Networks under Distribution Shift. In WWW. 1248–1258.
- Large scale online kernel learning. Journal of Machine Learning Research 17, 47 (2016), 1.
- Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks. NeurIPS 33 (2020).
- Kernel mean estimation and Stein effect. In ICML. PMLR, 10–18.
- Kernel mean estimation via spectral filtering. NeurIPS 27 (2014).
- Ali Rahimi and Benjamin Recht. 2007. Random features for large-scale kernel machines. Advances in neural information processing systems 20 (2007).
- A survey on oversmoothing in graph neural networks. arXiv preprint arXiv:2303.10993 (2023).
- A generalized representer theorem. In COLT. Springer, 416–426.
- Collective classification in network data. AI magazine 29, 3 (2008), 93–93.
- The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine 30, 3 (2013), 83–98.
- Alexander J Smola and Risi Kondor. 2003. Kernels and regularization on graphs. In Learning theory and kernel machines. Springer, 144–158.
- Single image defocus deblurring using kernel-sharing parallel atrous convolutions. In ICCV. 2642–2650.
- Social influence analysis in large-scale networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 807–816.
- Graph Attention Networks. In ICLR.
- Be confident! towards trustworthy graph neural networks via confidence calibration. NeurIPS 34 (2021), 23768–23779.
- Christopher Williams and Matthias Seeger. 2000. Using the Nyström method to speed up kernel machines. Advances in neural information processing systems 13 (2000).
- Simplifying graph convolutional networks. In ICML. PMLR, 6861–6871.
- QPGCN: Graph Convolutional Network with a Quadratic Polynomial Filter for Overcoming over-Smoothing. Applied Intelligence 53, 6 (jul 2022), 7216–7231. https://doi.org/10.1007/s10489-022-03836-2
- Representation learning on graphs with jumping knowledge networks. In ICML. 5453–5462.
- Relation learning on social networks with multi-modal graph edge variational autoencoders. In WSDM. 699–707.
- Graph Neural Networks Beyond Compromise Between Attribute and Topology. In WWW (Virtual Event, Lyon, France). Association for Computing Machinery, New York, NY, USA, 1127–1135. https://doi.org/10.1145/3485447.3512069
- Design space for graph neural networks. NeurIPS 33 (2020), 17009–17021.
- Lingxiao Zhao and Leman Akoglu. 2019. PairNorm: Tackling Oversmoothing in GNNs. In International Conference on Learning Representations.
- Learning with local and global consistency. NeurIPS 16 (2003).
- Hao Zhu and Piotr Koniusz. 2020. Simple spectral graph convolution. In ICLR.
- Beyond homophily in graph neural networks: Current limitations and effective designs. NeurIPS (2020).
- A Survey on Graph Structure Learning: Progress and Opportunities. arXiv e-prints (2021), arXiv–2103.