Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs (2306.05628v1)

Published 9 Jun 2023 in cs.LG

Abstract: To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student MLP. Despite their great progress, comparatively little work has been done to explore the reliability of different knowledge points (nodes) in GNNs, especially their roles played during distillation. In this paper, we first quantify the knowledge reliability in GNN by measuring the invariance of their information entropy to noise perturbations, from which we observe that different knowledge points (1) show different distillation speeds (temporally); (2) are differentially distributed in the graph (spatially). To achieve reliable distillation, we propose an effective approach, namely Knowledge-inspired Reliable Distillation (KRD), that models the probability of each node being an informative and reliable knowledge point, based on which we sample a set of additional reliable knowledge points as supervision for training student MLPs. Extensive experiments show that KRD improves over the vanilla MLPs by 12.62% and outperforms its corresponding teacher GNNs by 2.16% averaged over 7 datasets and 3 GNN architectures.

Citations (20)

View on Semantic Scholar

Summary

The paper identifies the under-confidence problem in GNN-to-MLP distillation and quantifies knowledge reliability using a perturbation invariance metric.
It exploits temporal and spatial differentials in knowledge points to filter unreliable information, enhancing the training process of MLPs.
Extensive experiments demonstrate that KRD improves MLP performance by 12.62% and outperforms the teacher GNNs by 2.16% across multiple datasets.

Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

Introduction

Graph Neural Networks (GNNs) have seen considerable success across a variety of applications due to their potent ability to handle graph-structured data. However, deploying GNNs in latency-sensitive scenarios is hindered by their inherent data dependency, which exacerbates latency issues. On the other hand, Multi-Layer Perceptrons (MLPs), while lacking in graph-structure awareness, promise faster inference times making them more desirable for industrial applications. Bridging the performance gap between these two model types, this paper introduces a novel approach for distilling knowledge from GNNs into MLPs, emphasizing the reliability of the knowledge being transferred.

GNN-to-MLP Distillation

Knowledge distillation has been proposed as a solution to leverage the graph structure awareness of GNNs to improve the performance of MLPs. Current methodologies, however, treat all knowledge points equally, not considering that different nodes may possess varying levels of informative value and reliability. This oversight results in an "under-confidence" problem in MLP predictions post-distillation. To address this, we quantify the knowledge within GNNs considering the invariance of their information entropy to noise perturbations. From this quantification, we observe that knowledge points:

Display different distillation speeds (temporally).
Are differentially distributed across the graph (spatially).

Knowledge-inspired Reliable Distillation (KRD)

Building upon the quantified knowledge reliability, the Knowledge-inspired Reliable Distillation (KRD) method is proposed. KRD filters unreliable knowledge points and exploits the most informative ones for a more effective MLP training. Extensive experiments demonstrate that KRD not only enhances the performance of vanilla MLPs by 12.62\% but also shows an improvement of 2.16\% over the teacher GNNs across various datasets and GNN architectures.

Key Contributions

Identification of the under-confidence problem in GNN-to-MLP distillation and a detailed exploration of its causes and resolutions.
Introduction of a perturbation invariance-based metric for the quantification of knowledge reliability within GNNs and an analysis of knowledge point roles both temporally and spatially.
Proposal of the KRD framework that leverages reliable knowledge points as additional supervision, substantially improving the performance of distilled MLPs.

Practical Implications and Future Directions

The KRD framework offers a robust solution for improving MLP performance by distilling knowledge from GNNs in a reliability-conscious manner. This development has significant implications for deploying MLPs in practical, latency-sensitive applications without sacrificing the informational benefits typically afforded by GNNs. Future work could explore the combination of KRD with other expressive teacher and student models to further bridge the performance gap in graph-structured data processing tasks.

PDF Markdown