Propagate & Distill: Towards Effective Graph Learners Using Propagation-Embracing MLPs (2311.17781v1)
Abstract: Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semisupervised node classification on graphs, by training a student MLP by knowledge distillation from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during distillation, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation $T$ and propagation $\Pi$, we re-frame the distillation process as making the student MLP learn both $T$ and $\Pi$. Although this can be achieved by applying the inverse propagation $\Pi{-1}$ before distillation from the teacher, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher before distillation, which can be interpreted as an approximate process of the inverse propagation. We demonstrate that P&D can readily improve the performance of the student MLP.
- How powerful are graph neural networks? In Proc. 7th Int. Conf. Learn. Representations (ICLR), New Orleans, LA, May 2019.
- Weisfeiler and Leman go neural: Higher-order graph neural networks. In Proc. AAAI Conf. Artif. Intell. (AAAI), pages 4602–4609, Honolulu, HI, Jan.–Feb. 2019.
- TinyGNN: Learning efficient graph neural networks. In Proc. 26th ACM SIGKDD Conf. Knowl. Discovery Data Mining (KDD), pages 1848–1856, Virtual Event, Aug. 2020.
- Graph-less neural networks: Teaching old mlps new tricks via distillation. In Proc. 10th Int. Conf. Learn. Representations (ICLR), Virtual Event, Apr. 2022.
- P-Companion: A principled framework for diversified complementary product recommendation. In 29th ACM Int. Conf. Inf. Knowl. Management (CIKM), pages 2517–2524, Virtual Event, Oct. 2020.
- AGL: A scalable system for industrial-purpose graph machine learning. Proc. VLDB Endow., 13(12):3125–3137, 2020.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Learning MLPs on graphs: A unified view of effectiveness, robustness, and efficiency. In Proc. 11th Int. Conf. Learn. Representations (ICLR), Kigali Rwanda, May 2023.
- SA-MLP: Distilling graph knowledge from GNNs into structure-aware MLP. arXiv preprint arXiv:2210.09609, 2022.
- Cold Brew: Distilling graph node representations with incomplete or missing neighborhoods. In Proc. 10th Int. Conf. Learn. Representations (ICLR), Virtual Event, Apr. 2022.
- DeepWalk: Online learning of social representations. In Proc. 20th ACM SIGKDD Conf. Knowl. Discovery Data Mining (KDD), pages 701–710, 2014.
- Pitfalls of graph neural network evaluation. In Proc. NeurIPS Relational Representation Learning Workshop, Dec 2018.
- DeepGCNs: Can GCNs go as deep as CNNs? In Proc. Int. Conf. Comput. Vision (ICCV), pages 9266–9275, Seoul, South Korea, Oct.–Nov. 2019.
- Deeper insights into graph convolutional networks for semi-supervised learning. In Proc. AAAI Conf. Artif. Intell. (AAAI), volume 32, 2018.
- Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proc. AAAI Conf. Artif. Intell. (AAAI), pages 3438–3445, New York, NY, Feb. 2020.
- Predict then propagate: Graph neural networks meet personalized pagerank. In Proc. 7th Int. Conf. Learn. Representations (ICLR), New Orleans, LA, May 2019.
- Combining label propagation and simple models out-performs graph neural networks. In Proc. 9th Int. Conf. Learn. Representations (ICLR), Virtual Event, May 2021.
- Scaling graph neural networks with approximate pagerank. In Proc. 26th ACM SIGKDD Conf. Knowl. Discovery Data Mining (KDD), pages 2464–2473, Virtual Event, Aug. 2020.
- Adaptive universal generalized pagerank graph neural network. In Proc. 9th Int. Conf. Learn. Representations (ICLR), Virtual Event, May 2021.
- Learning with local and global consistency. In Proc. 17th Int. Conf. Neural Inf. Process. Syst. (NeurIPS) 2003, pages 321–328, Vancouver and Whistler, Canada, Dec. 2003.
- Semi-supervised learning using Gaussian fields and harmonic functions. In Proc. 12th Int. Conf. Mach. Learn. (ICML), pages 912–919, Washington, DC, Aug. 2003.
- Neural message passing for quantum chemistry. In Proc. 34th Int. Conf. Mach. Learn. (ICML), pages 1263–1272, Sydney, Australia, Aug. 2017.
- Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. In Proc. Web Conf. WWW,, pages 1227–1237, Virtual Event / Ljubljana, Slovenia, Apr. 2021.
- Inductive representation learning on large graphs. In Proc. 31st Int. Conf. Neural Inf. Process. Syst. (NeurIPS), pages 1024–1034, Dec. 2017.
- Adam: A method for stochastic optimization. In Proc. 3rd Int. Conf. Learn. Representations (ICLR), San Diego, CA, May 2015.
- Collective classification in network data. AI Mag., 29(3):93–106, 2008.
- Revisiting semi-supervised learning with graph embeddings. In Proc. 33th Int. Conf. Mach. Learn. (ICML), pages 40–48, New York City, NY, Jun. 2016.
- Graph-MLP: Node classification without message passing in graph. arXiv preprint arXiv:2106.04051, 2021.
- Node representation learning in graph via node-to-neighbourhood mutual information maximization. In Proc. Conf. Comput. Vision Pattern Recognit. (CVPR), pages 16599–16608, New Orleans, LA, Jun. 2022.
- Edge-free but structure-aware: Prototype-guided knowledge distillation from gnns to mlps. arXiv preprint arXiv:2303.13763, 2023a.
- Extracting low-/high- frequency knowledge from graph neural networks and injecting it into mlps: An effective gnn-to-mlp distillation framework. In Proc. AAAI Conf. Artif. Intell. (AAAI), pages 10351–10360, Washington, DC, Feb. 2023b.
- Model compression. In Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD), pages 535–541, Philadelphia, PA, Aug. 2006.
- Knowledge distillation: A survey. Int. J. Comput. Vis., 129(6):1789–1819, 2021.
- Implicit graph neural networks. In Proc. 34th Int. Conf. Neural Inf. Process. Syst. (NeurIPS), Virtual Event, Dec. 2020.
- A unified view on graph neural networks as graph signal denoising. In 30th ACM Int. Conf. Inf. Knowl. Management (CIKM), pages 2517–2524, Virtual Event, Queensland, Australia, Nov. 2021.
- Beyond homophily in graph neural networks: Current limitations and effective designs. In Proc. 34th Int. Conf. Neural Inf. Process. Syst. (NeurIPS), Virtual Event, Dec. 2020.