VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs (2308.02117v3)
Abstract: GNN-to-MLP distillation aims to utilize knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (student MLP) on graph data by mimicking the output representations of teacher GNN. Existing methods mainly make the MLP to mimic the GNN predictions over a few class labels. However, the class space may not be expressive enough for covering numerous diverse local graph structures, thus limiting the performance of knowledge transfer from GNN to MLP. To address this issue, we propose to learn a new powerful graph representation space by directly labeling nodes' diverse local structures for GNN-to-MLP distillation. Specifically, we propose a variant of VQ-VAE to learn a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code. The discrete codes constitute a codebook as a new graph representation space that is able to identify different local graph structures of nodes with the corresponding code indices. Then, based on the learned codebook, we propose a new distillation target, namely soft code assignments, to directly transfer the structural knowledge of each node from GNN to MLP. The resulting framework VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings across seven graph datasets. We show that VQGraph with better performance infers faster than GNNs by 828x, and also achieves accuracy improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average, respectively. Code: https://github.com/YangLing0818/VQGraph.
- Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In international conference on machine learning, pp. 21–29. PMLR, 2019.
- Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
- Mincut pooling in graph neural networks. 2019.
- Scaling graph neural networks with approximate pagerank. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2464–2473, 2020.
- Stochastic training of graph convolutional networks with variance reduction. In International Conference on Machine Learning, pp. 942–950. PMLR, 2018a.
- Fastgcn: Fast learning with graph convolutional networks via importance sampling. In International Conference on Learning Representations, 2018b.
- FastGCN: Fast learning with graph convolutional networks via importance sampling. In International Conference on Learning Representations, 2018c. URL https://openreview.net/forum?id=rytstxWAW.
- Scalable graph neural networks via bidirectional propagation. Advances in neural information processing systems, 33:14556–14566, 2020a.
- Simple and deep graph convolutional networks. In ICML, 2020b.
- Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 367–379, 2016. doi: 10.1109/ISCA.2016.40.
- Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 257–266, 2019.
- Node feature extraction by self-supervised multi-scale neighborhood prediction. In International Conference on Learning Representations, 2022.
- Graph-free knowledge distillation for graph neural networks, 2021.
- Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 315–324, 2020.
- Graph neural networks for social recommendation. In The world wide web conference, pp. 417–426, 2019.
- Gnnautoscale: Scalable and expressive graph neural networks via historical embeddings. In International Conference on Machine Learning, pp. 3294–3304. PMLR, 2021.
- Knowledge distillation: A survey. International Journal of Computer Vision, 2021.
- A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering, 34(8):3549–3568, 2020.
- Deep learning with limited numerical precision. In Francis Bach and David Blei (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp. 1737–1746, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/gupta15.html.
- Inductive representation learning on large graphs. In NeurIPS, 2017.
- Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, pp. 1135–1143, Cambridge, MA, USA, 2015. MIT Press.
- Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp. 639–648, 2020.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020a.
- Open graph benchmark: Datasets for machine learning on graphs. CoRR, abs/2005.00687, 2020b. URL https://arxiv.org/abs/2005.00687.
- Graph-mlp: node classification without message passing in graph. arXiv preprint arXiv:2106.04051, 2021.
- Combining label propagation and simple models out-performs graph neural networks. In International Conference on Learning Representations, 2021.
- Adaptive sampling towards fast graph representation learning. Advances in neural information processing systems, 31, 2018.
- Redundancy-free computation for graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 997–1005, 2020.
- Proteus: Exploiting numerical precision variability in deep neural networks. In Proceedings of the 2016 International Conference on Supercomputing, pp. 23, 2016.
- Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
- Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
- Predict then propagate: Graph neural networks meet personalized pagerank. In ICLR, 2019.
- Neighborhood reconstructing autoencoders. Advances in Neural Information Processing Systems, 34:536–546, 2021.
- Deepgcns: Can gcns go as deep as cnns? In ICCV, 2019.
- Distance encoding: Design provably more powerful neural networks for graph representation learning. In NeurIPS, 2020a.
- Learning better representations for neural information retrieval with graph information. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 795–804, 2020b.
- Pick and choose: a gnn-based imbalanced learning approach for fraud detection. In Proceedings of the Web Conference 2021, pp. 3168–3177, 2021.
- Item tagging for information retrieval: a tripartite graph neural network based approach. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2327–2336, 2020.
- Weisfeiler and leman go neural: Higher-order graph neural networks. In AAAI, 2019.
- Query-driven active surveying for collective classification. In 10th international workshop on mining and learning with graphs, volume 8, pp. 1, 2012.
- Graphens: Neighbor-aware ego network synthesis for class-imbalanced node classification. In International Conference on Learning Representations, 2021.
- Geom-gcn: Geometric graph convolutional networks. In International Conference on Learning Representations, 2020.
- Deepwalk: Online learning of social representations. In KDD, 2014.
- Collective classification in network data. AI magazine, 29(3):93–93, 2008.
- Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725, 2016.
- Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.
- Lmc: Fast training of gnns via subgraph sampling with provable convergence. In The Eleventh International Conference on Learning Representations, 2023.
- Graph auto-encoder via neighborhood wasserstein reconstruction. In International Conference on Learning Representations, 2022.
- Heterogeneous graph masked autoencoders. In AAAI, 2023a.
- Knowledge distillation on graphs: A survey. arXiv preprint arXiv:2302.00219, 2023b.
- Learning mlps on graphs: A unified view of effectiveness, robustness, and efficiency. In The Eleventh International Conference on Learning Representations, 2023c.
- Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Graph attention networks. In ICLR, 2018.
- Equivariant and stable positional encoding for more powerful graph neural networks. In ICLR, 2022.
- Graph neural networks in recommender systems: a survey. ACM Computing Surveys, 55(5):1–37, 2022.
- Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
- A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 2020.
- How powerful are graph neural networks? In ICLR, 2019.
- Graphsail: Graph structure aware incremental learning for recommender systems. In Proceedings of the 29th ACM International Conference on Information; Knowledge Management, CIKM ’20, pp. 2861–2868, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450368599. doi: 10.1145/3340531.3412754. URL https://doi.org/10.1145/3340531.3412754.
- Tinygnn: Learning efficient graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), pp. 1848–1856, 2020.
- Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. In WWW, 2021a.
- Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework, 2021b.
- Distilling knowledge from graph convolutional networks, 2021c.
- Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 974–983, 2018.
- Position-aware graph neural networks. In ICML, 2019.
- Revisiting knowledge distillation via label smoothing regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3903–3911, 2020.
- Graphsaint: Graph sampling based inductive learning method, 2020.
- Diving into unified data-model sparsity for class-imbalanced graph representation learning. In GLFrontiers, 2022a.
- Agl: A scalable system for industrial-purpose graph machine learning. arXiv preprint arXiv:2003.02454, 2020.
- Graph-less neural networks: Teaching old mlps new tricks via distillation. In ICLR, 2022b.
- From stars to subgraphs: Uplifting any gnn with local structure awareness. In International Conference on Learning Representations, 2022.
- Learned low precision graph neural networks, 2020.
- Cold brew: Distilling graph node representations with incomplete or missing neighborhoods. In ICLR, 2022.
- Rethinking soft labels for knowledge distillation: A bias–variance tradeoff perspective. In Proceedings of International Conference on Learning Representations (ICLR), 2021.
- Accelerating large scale real-time GNN inference using channel pruning. CoRR, abs/2105.04528, 2021. URL https://arxiv.org/abs/2105.04528.
- Graph neural networks: A review of methods and applications. AI Open, 2020.
- Transfer learning of graph neural networks with ego-graph information maximization. Advances in Neural Information Processing Systems, 34:1766–1779, 2021.
- Learning from labeled and unlabeled data with label propagation. 2002.
- Layer-dependent importance sampling for training deep and large graph convolutional networks. arXiv preprint arXiv:1911.07323, 2019.