Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Propagate & Distill: Towards Effective Graph Learners Using Propagation-Embracing MLPs (2311.17781v1)

Published 29 Nov 2023 in cs.LG, cs.AI, cs.IT, cs.NE, cs.SI, and math.IT

Abstract: Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semisupervised node classification on graphs, by training a student MLP by knowledge distillation from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during distillation, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation $T$ and propagation $\Pi$, we re-frame the distillation process as making the student MLP learn both $T$ and $\Pi$. Although this can be achieved by applying the inverse propagation $\Pi{-1}$ before distillation from the teacher, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher before distillation, which can be interpreted as an approximate process of the inverse propagation. We demonstrate that P&D can readily improve the performance of the student MLP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. How powerful are graph neural networks? In Proc. 7th Int. Conf. Learn. Representations (ICLR), New Orleans, LA, May 2019.
  2. Weisfeiler and Leman go neural: Higher-order graph neural networks. In Proc. AAAI Conf. Artif. Intell. (AAAI), pages 4602–4609, Honolulu, HI, Jan.–Feb. 2019.
  3. TinyGNN: Learning efficient graph neural networks. In Proc. 26th ACM SIGKDD Conf. Knowl. Discovery Data Mining (KDD), pages 1848–1856, Virtual Event, Aug. 2020.
  4. Graph-less neural networks: Teaching old mlps new tricks via distillation. In Proc. 10th Int. Conf. Learn. Representations (ICLR), Virtual Event, Apr. 2022.
  5. P-Companion: A principled framework for diversified complementary product recommendation. In 29th ACM Int. Conf. Inf. Knowl. Management (CIKM), pages 2517–2524, Virtual Event, Oct. 2020.
  6. AGL: A scalable system for industrial-purpose graph machine learning. Proc. VLDB Endow., 13(12):3125–3137, 2020.
  7. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  8. Learning MLPs on graphs: A unified view of effectiveness, robustness, and efficiency. In Proc. 11th Int. Conf. Learn. Representations (ICLR), Kigali Rwanda, May 2023.
  9. SA-MLP: Distilling graph knowledge from GNNs into structure-aware MLP. arXiv preprint arXiv:2210.09609, 2022.
  10. Cold Brew: Distilling graph node representations with incomplete or missing neighborhoods. In Proc. 10th Int. Conf. Learn. Representations (ICLR), Virtual Event, Apr. 2022.
  11. DeepWalk: Online learning of social representations. In Proc. 20th ACM SIGKDD Conf. Knowl. Discovery Data Mining (KDD), pages 701–710, 2014.
  12. Pitfalls of graph neural network evaluation. In Proc. NeurIPS Relational Representation Learning Workshop, Dec 2018.
  13. DeepGCNs: Can GCNs go as deep as CNNs? In Proc. Int. Conf. Comput. Vision (ICCV), pages 9266–9275, Seoul, South Korea, Oct.–Nov. 2019.
  14. Deeper insights into graph convolutional networks for semi-supervised learning. In Proc. AAAI Conf. Artif. Intell. (AAAI), volume 32, 2018.
  15. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proc. AAAI Conf. Artif. Intell. (AAAI), pages 3438–3445, New York, NY, Feb. 2020.
  16. Predict then propagate: Graph neural networks meet personalized pagerank. In Proc. 7th Int. Conf. Learn. Representations (ICLR), New Orleans, LA, May 2019.
  17. Combining label propagation and simple models out-performs graph neural networks. In Proc. 9th Int. Conf. Learn. Representations (ICLR), Virtual Event, May 2021.
  18. Scaling graph neural networks with approximate pagerank. In Proc. 26th ACM SIGKDD Conf. Knowl. Discovery Data Mining (KDD), pages 2464–2473, Virtual Event, Aug. 2020.
  19. Adaptive universal generalized pagerank graph neural network. In Proc. 9th Int. Conf. Learn. Representations (ICLR), Virtual Event, May 2021.
  20. Learning with local and global consistency. In Proc. 17th Int. Conf. Neural Inf. Process. Syst. (NeurIPS) 2003, pages 321–328, Vancouver and Whistler, Canada, Dec. 2003.
  21. Semi-supervised learning using Gaussian fields and harmonic functions. In Proc. 12th Int. Conf. Mach. Learn. (ICML), pages 912–919, Washington, DC, Aug. 2003.
  22. Neural message passing for quantum chemistry. In Proc. 34th Int. Conf. Mach. Learn. (ICML), pages 1263–1272, Sydney, Australia, Aug. 2017.
  23. Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. In Proc. Web Conf. WWW,, pages 1227–1237, Virtual Event / Ljubljana, Slovenia, Apr. 2021.
  24. Inductive representation learning on large graphs. In Proc. 31st Int. Conf. Neural Inf. Process. Syst. (NeurIPS), pages 1024–1034, Dec. 2017.
  25. Adam: A method for stochastic optimization. In Proc. 3rd Int. Conf. Learn. Representations (ICLR), San Diego, CA, May 2015.
  26. Collective classification in network data. AI Mag., 29(3):93–106, 2008.
  27. Revisiting semi-supervised learning with graph embeddings. In Proc. 33th Int. Conf. Mach. Learn. (ICML), pages 40–48, New York City, NY, Jun. 2016.
  28. Graph-MLP: Node classification without message passing in graph. arXiv preprint arXiv:2106.04051, 2021.
  29. Node representation learning in graph via node-to-neighbourhood mutual information maximization. In Proc. Conf. Comput. Vision Pattern Recognit. (CVPR), pages 16599–16608, New Orleans, LA, Jun. 2022.
  30. Edge-free but structure-aware: Prototype-guided knowledge distillation from gnns to mlps. arXiv preprint arXiv:2303.13763, 2023a.
  31. Extracting low-/high- frequency knowledge from graph neural networks and injecting it into mlps: An effective gnn-to-mlp distillation framework. In Proc. AAAI Conf. Artif. Intell. (AAAI), pages 10351–10360, Washington, DC, Feb. 2023b.
  32. Model compression. In Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD), pages 535–541, Philadelphia, PA, Aug. 2006.
  33. Knowledge distillation: A survey. Int. J. Comput. Vis., 129(6):1789–1819, 2021.
  34. Implicit graph neural networks. In Proc. 34th Int. Conf. Neural Inf. Process. Syst. (NeurIPS), Virtual Event, Dec. 2020.
  35. A unified view on graph neural networks as graph signal denoising. In 30th ACM Int. Conf. Inf. Knowl. Management (CIKM), pages 2517–2524, Virtual Event, Queensland, Australia, Nov. 2021.
  36. Beyond homophily in graph neural networks: Current limitations and effective designs. In Proc. 34th Int. Conf. Neural Inf. Process. Syst. (NeurIPS), Virtual Event, Dec. 2020.
Citations (1)

Summary

  • The paper demonstrates that incorporating recursive propagation of teacher GNN output into MLP training significantly boosts performance.
  • It separates feature transformation from propagation, allowing flexible and scalable knowledge distillation for graph data.
  • Key experiments show that stronger propagation and more iterations lead to consistent improvements in both transductive and inductive settings.

Introduction

Graph Neural Networks (GNNs) boast superior performance in tasks that involve graph-structured data due to their ability to leverage the connections between data points. They are not without drawbacks, however, as GNNs typically suffer from slow inference times. This is a significant limitation for real-time applications. As a solution, recent research has explored distilling the knowledge from a complex GNN (teacher) into a simpler Multi-Layer Perceptron (MLP) model (student), thus achieving faster inference while retaining performance. This paper introduces a new approach, Propagate & Distill (P&D), which enhances the performance of the student MLP by incorporating graph structure information in the knowledge distillation process.

Methodology

Central to the proposed P&D methodology is the separation of feature transformation and propagation processes, akin to certain GNNs. The authors re-conceptualize distillation as a process wherein the inverse of the teacher GNN's propagation is applied to the student MLP during training. Rather than executing computationally costly matrix operations, P&D approximates this by employing recursive propagation of the teacher GNN's output. The proposed framework offers the flexibility to adjust the propagation rules and is scalable to accommodate different kinds of graph data.

Main Results

P&D was tested across various datasets and demonstrated consistent improvements over existing GNN-to-MLP distillation methods. Experiments revealed that both the strength of the propagation and the number of recursive propagation iterations influence the performance, with stronger and more iterations generally yielding better outcomes. It was also found that P&D can effectively operate in both transductive settings (where all nodes are present during training) and inductive settings (where some nodes are unseen during training), indicating the method's robustness.

Conclusion

The P&D framework highlights a novel direction for graph learning using simpler MLP structures by integrating graph structure-aware distillation processes. Through the recursive propagation of the teacher GNN's output before distillation, P&D injects additional graph structural information into the MLP, leading to enhanced performance on benchmark graph datasets in varied testing environments. This work opens the door for further exploration into efficient graph learning and the continued evolution of knowledge distillation techniques for GNNs.