Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification (2306.08385v1)

Published 14 Jun 2023 in cs.LG and cs.AI

Abstract: Graph neural networks have been extensively studied for learning with inter-connected data. Despite this, recent evidence has revealed GNNs' deficiencies related to over-squashing, heterophily, handling long-range dependencies, edge incompleteness and particularly, the absence of graphs altogether. While a plausible solution is to learn new adaptive topology for message passing, issues concerning quadratic complexity hinder simultaneous guarantees for scalability and precision in large networks. In this paper, we introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes, as an important building block for a pioneering Transformer-style network for node classification on large graphs, dubbed as \textsc{NodeFormer}. Specifically, the efficient computation is enabled by a kernerlized Gumbel-Softmax operator that reduces the algorithmic complexity to linearity w.r.t. node numbers for learning latent graph structures from large, potentially fully-connected graphs in a differentiable manner. We also provide accompanying theory as justification for our design. Extensive experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs (with up to 2M nodes) and graph-enhanced applications (e.g., image classification) where input graphs are missing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In International Conference on Machine Learning, pages 21–29, 2019.
  2. On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations, 2021.
  3. Geometric deep learning: going beyond euclidean data. CoRR, abs/1611.08097, 2016.
  4. Iterative deep graph learning for graph neural networks: Better and robust node embeddings. In Advances in Neural Information Processing Systems, 2020.
  5. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 257–266, 2019.
  6. Rethinking attention with performers. In International Conference on Learning Representations, 2021.
  7. Latent patient network learning for automatic diagnosis. CoRR, abs/2003.13620, 2020.
  8. Learning steady-states of iterative algorithms over graphs. In International Conference on Machine Learning, pages 1114–1122, 2018.
  9. A generalization of transformer networks to graphs. CoRR, abs/2012.09699, 2020.
  10. Variational inference for graph convolutional networks in the absence of graph data and adversarial settings. In Advances in Neural Information Processing Systems, 2020.
  11. Learning discrete structures for graph neural networks. In International Conference on Machine Learning, pages 1972–1982, 2019.
  12. Room-and-object aware knowledge reasoning for remote embodied referring expression. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3064–3073, 2021.
  13. Implicit graph neural networks. In Advances in Neural Information Processing Systems, 2020.
  14. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pages 1024–1034, 2017.
  15. Open graph benchmark: Datasets for machine learning on graphs. In Advances in Neural Information Processing Systems, 2020.
  16. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017.
  17. Semi-supervised learning with graph learning-convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 11313–11320, 2019.
  18. Graph structure learning for robust graph neural networks. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 66–74, 2020.
  19. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  20. Variational inference for training graph neural networks in low-data regime through joint structure-label estimation. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 824–834, 2022.
  21. New benchmarks for learning on non-homophilous graphs. CoRR, abs/2104.01404, 2021.
  22. Learning to drop: Robust graph neural network via topological denoising. In ACM International Conference on Web Search and Data Mining, pages 779–787, 2021.
  23. The concrete distribution: A continuous relaxation of discrete random variables. In International Conference on Learning Representations, 2017.
  24. Inferring networks of substitutable and complementary products. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794, 2015.
  25. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  26. Geom-gcn: Geometric graph convolutional networks. In International Conference on Learning Representations, 2020.
  27. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems, pages 1177–1184, 2007.
  28. Dropedge: Towards deep graph convolutional networks on node classification. In International Conference on Learning Representations, 2020.
  29. Characteristic functions on graphs: Birds of a feather, from statistical descriptors to parametric models. In ACM International Conference on Information and Knowledge Management, pages 1325–1334, 2020.
  30. Learning to simulate complex physics with graph networks. In International Conference on Machine Learning, pages 8459–8468, 2020.
  31. Few-shot learning with graph neural networks. In International Conference on Learning Representations, 2018.
  32. The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
  33. Collective classification in network data. AI Mag., 29(3):93–106, 2008.
  34. Fast graph attention networks using effective resistance based graph sparsification. CoRR, abs/2006.08796, 2020.
  35. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  36. Graph attention networks. In International Conference on Learning Representations (ICLR), 2018.
  37. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pages 3630–3638, 2016.
  38. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph., 38(5):146:1–146:12, 2019.
  39. Simplifying graph convolutional networks. In International Conference on Machine Learning, pages 6861–6871, 2019.
  40. Towards open-world feature extrapolation: An inductive graph learning approach. Advances in Neural Information Processing Systems, pages 19435–19447, 2021.
  41. Towards open-world recommendation: An inductive model-based collaborative filtering approach. In International Conference on Machine Learning, pages 11329–11339, 2021.
  42. Graph information bottleneck. In Advances in Neural Information Processing Systems, 2020.
  43. A quest for structure: Jointly learning the graph structure and semi-supervised classification. In ACM International Conference on Information and Knowledge Management, pages 87–96, 2018.
  44. Representation learning on graphs with jumping knowledge networks. In International Conference on Machine Learning, pages 5449–5458, 2018.
  45. Geometric knowledge distillation: Topology compression for graph neural networks. In Advances in Neural Information Processing Systems, 2022.
  46. Graph convolutional networks for text classification. In AAAI Conference on Artificial Intelligence, pages 7370–7377, 2019.
  47. GNN explainer: A tool for post-hoc explanation of graph neural networks. In Advances in Neural Information Processing Systems, 2019.
  48. Graphsaint: Graph sampling based inductive learning method. In International Conference on Learning Representations, 2020.
  49. Scalegcn: Efficient and effective graph convolution via channel-wise scale transformation. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  50. Gnnguard: Defending graph neural networks against adversarial attacks. In Advances in Neural Information Processing Systems, 2020.
  51. Bayesian graph convolutional neural networks for semi-supervised classification. In AAAI Conference on Artificial Intelligence, pages 5829–5836, 2019.
  52. Robust graph representation learning via neural sparsification. In International Conference on Machine Learning, pages 11458–11468, 2020.
  53. Beyond homophily in graph neural networks: Current limitations and effective designs. In Advances in Neural Information Processing Systems, 2020.
  54. Deep graph structure learning for robust representations: A survey. CoRR, abs/2103.03036, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qitian Wu (29 papers)
  2. Wentao Zhao (20 papers)
  3. Zenan Li (22 papers)
  4. David Wipf (59 papers)
  5. Junchi Yan (241 papers)
Citations (169)

Summary

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

The paper "NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification" introduces an innovative architecture designed to address fundamental challenges in Graph Neural Networks (GNNs), particularly concerning scalability and efficiency in handling large graphs. The document provides a detailed exposition of NodeFormer, a framework that enhances node classification by learning latent graph structures beyond explicit input topology.

Key Contributions

NodeFormer is constructed on the premise that traditional GNNs face several limitations, including over-squashing, difficulties with heterophily, long-range dependencies, and edge incompleteness. These issues are exacerbated when dealing with large graphs where the network structure might not be available. To mitigate these limitations, the paper proposes a new all-pair message passing paradigm that incorporates a kernelized Gumbel-Softmax operator. This operator significantly reduces the complexity associated with learning graph structures, lowering it from quadratic to linear concerning node numbers. This allows NodeFormer to efficiently scale to large datasets with millions of nodes.

Technical Insights

NodeFormer uses an efficient mechanism for latent structure learning which involves:

  • Kernelized Gumbel-Softmax Operator: This operator combines positive random features with an approximated sampling strategy, enabling differentiable optimization and efficient message passing by avoiding the computation of cumbersome all-pair similarity matrices.
  • Layer-wise Message Passing: Instead of relying on a fixed graph structure across all layers, NodeFormer learns latent graphs independently for each layer.
  • Relational Bias and Edge-Level Regularization: Leveraging available input graph structures, NodeFormer incorporates relational biases to reinforce weights on observed edges and uses edge-level regularization to ensure robustness against input graph incompleteness.

Empirical Results

The paper delivers extensive evaluations across various datasets, including node classification benchmarks and image/text classification tasks. NodeFormer demonstrates superior performance in environments characterized by homophily and heterophily, managing to outperform strong baseline GNN models and state-of-the-art structure learning approaches. Remarkably, NodeFormer effectively scales to large graphs with up to 2 million nodes, showcasing up to 93.1% reduction in time complexity and 80.6% reduction in space consumption compared to previous methods.

Implications and Future Directions

The introduction of NodeFormer implies substantial improvements for practical AI systems that operate on graph-structured data. It can be particularly beneficial in applications requiring robust node representations in large-scale networks, such as social and biological domains. Furthermore, the paper opens avenues for future exploration in applying NodeFormer architecture to other graph-related tasks like link prediction and graph regression. The model’s scalability and efficiency offer promising potential for integrating such framework in diverse scientific and industrial applications.

This paper pushes the boundaries of GNN scalability, presenting a model that could redefine approaches to handling vast, intricately connected data systems.