- The paper identifies the under-confidence problem in GNN-to-MLP distillation and quantifies knowledge reliability using a perturbation invariance metric.
- It exploits temporal and spatial differentials in knowledge points to filter unreliable information, enhancing the training process of MLPs.
- Extensive experiments demonstrate that KRD improves MLP performance by 12.62% and outperforms the teacher GNNs by 2.16% across multiple datasets.
Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs
Introduction
Graph Neural Networks (GNNs) have seen considerable success across a variety of applications due to their potent ability to handle graph-structured data. However, deploying GNNs in latency-sensitive scenarios is hindered by their inherent data dependency, which exacerbates latency issues. On the other hand, Multi-Layer Perceptrons (MLPs), while lacking in graph-structure awareness, promise faster inference times making them more desirable for industrial applications. Bridging the performance gap between these two model types, this paper introduces a novel approach for distilling knowledge from GNNs into MLPs, emphasizing the reliability of the knowledge being transferred.
GNN-to-MLP Distillation
Knowledge distillation has been proposed as a solution to leverage the graph structure awareness of GNNs to improve the performance of MLPs. Current methodologies, however, treat all knowledge points equally, not considering that different nodes may possess varying levels of informative value and reliability. This oversight results in an "under-confidence" problem in MLP predictions post-distillation. To address this, we quantify the knowledge within GNNs considering the invariance of their information entropy to noise perturbations. From this quantification, we observe that knowledge points:
- Display different distillation speeds (temporally).
- Are differentially distributed across the graph (spatially).
Knowledge-inspired Reliable Distillation (KRD)
Building upon the quantified knowledge reliability, the Knowledge-inspired Reliable Distillation (KRD) method is proposed. KRD filters unreliable knowledge points and exploits the most informative ones for a more effective MLP training. Extensive experiments demonstrate that KRD not only enhances the performance of vanilla MLPs by 12.62\% but also shows an improvement of 2.16\% over the teacher GNNs across various datasets and GNN architectures.
Key Contributions
- Identification of the under-confidence problem in GNN-to-MLP distillation and a detailed exploration of its causes and resolutions.
- Introduction of a perturbation invariance-based metric for the quantification of knowledge reliability within GNNs and an analysis of knowledge point roles both temporally and spatially.
- Proposal of the KRD framework that leverages reliable knowledge points as additional supervision, substantially improving the performance of distilled MLPs.
Practical Implications and Future Directions
The KRD framework offers a robust solution for improving MLP performance by distilling knowledge from GNNs in a reliability-conscious manner. This development has significant implications for deploying MLPs in practical, latency-sensitive applications without sacrificing the informational benefits typically afforded by GNNs. Future work could explore the combination of KRD with other expressive teacher and student models to further bridge the performance gap in graph-structured data processing tasks.