Propagate & Distill: Towards Effective Graph Learners Using Propagation-Embracing MLPs (2311.17781v1)

Published 29 Nov 2023 in cs.LG, cs.AI, cs.IT, cs.NE, cs.SI, and math.IT

Abstract: Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semisupervised node classification on graphs, by training a student MLP by knowledge distillation from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during distillation, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation $T$ and propagation $\Pi$, we re-frame the distillation process as making the student MLP learn both $T$ and $\Pi$. Although this can be achieved by applying the inverse propagation $\Pi^{-1}$ before distillation from the teacher, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher before distillation, which can be interpreted as an approximate process of the inverse propagation. We demonstrate that P&D can readily improve the performance of the student MLP.

References (36)

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that incorporating recursive propagation of teacher GNN output into MLP training significantly boosts performance.
It separates feature transformation from propagation, allowing flexible and scalable knowledge distillation for graph data.
Key experiments show that stronger propagation and more iterations lead to consistent improvements in both transductive and inductive settings.

Introduction

Graph Neural Networks (GNNs) boast superior performance in tasks that involve graph-structured data due to their ability to leverage the connections between data points. They are not without drawbacks, however, as GNNs typically suffer from slow inference times. This is a significant limitation for real-time applications. As a solution, recent research has explored distilling the knowledge from a complex GNN (teacher) into a simpler Multi-Layer Perceptron (MLP) model (student), thus achieving faster inference while retaining performance. This paper introduces a new approach, Propagate & Distill (P&D), which enhances the performance of the student MLP by incorporating graph structure information in the knowledge distillation process.

Methodology

Central to the proposed P&D methodology is the separation of feature transformation and propagation processes, akin to certain GNNs. The authors re-conceptualize distillation as a process wherein the inverse of the teacher GNN's propagation is applied to the student MLP during training. Rather than executing computationally costly matrix operations, P&D approximates this by employing recursive propagation of the teacher GNN's output. The proposed framework offers the flexibility to adjust the propagation rules and is scalable to accommodate different kinds of graph data.

Main Results

P&D was tested across various datasets and demonstrated consistent improvements over existing GNN-to-MLP distillation methods. Experiments revealed that both the strength of the propagation and the number of recursive propagation iterations influence the performance, with stronger and more iterations generally yielding better outcomes. It was also found that P&D can effectively operate in both transductive settings (where all nodes are present during training) and inductive settings (where some nodes are unseen during training), indicating the method's robustness.

Conclusion

The P&D framework highlights a novel direction for graph learning using simpler MLP structures by integrating graph structure-aware distillation processes. Through the recursive propagation of the teacher GNN's output before distillation, P&D injects additional graph structural information into the MLP, leading to enhanced performance on benchmark graph datasets in varied testing environments. This work opens the door for further exploration into efficient graph learning and the continued evolution of knowledge distillation techniques for GNNs.

PDF Markdown