Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs (2303.13763v3)

Published 24 Mar 2023 in cs.LG and cs.AI

Abstract: Distilling high-accuracy Graph Neural Networks (GNNs) to low-latency multilayer perceptions (MLPs) on graph tasks has become a hot research topic. However, conventional MLP learning relies almost exclusively on graph nodes and fails to effectively capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose Prototype-Guided Knowledge Distillation (PGKD), which does not require graph edges (edge-free setting) yet learns structure-aware MLPs. Our insight is to distill graph structural information from GNNs. Specifically, we first employ the class prototypes to analyze the impact of graph structures on GNN teachers, and then design two losses to distill such information from GNNs to MLPs. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD.

Citations (8)

Summary

  • The paper introduces PGKD, a method to distill GNNs into MLPs by leveraging intra-class and inter-class losses to preserve graph structure without explicit edge data.
  • Experimental results show PGKD outperforms baseline edge-free models like GLNN on benchmarks such as Cora, Citeseer, and Pubmed in both transductive and inductive settings.
  • Ablation studies confirm that prototype-guided guidance enhances robustness against noisy features and optimizes MLP configurations for graph tasks.

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

Introduction

Graph Neural Networks (GNNs) have shown stellar performance in handling non-Euclidean data, especially for tasks related to graph machine learning such as node classification. However, their high latency due to the neighborhood aggregation operation makes their use in real-world applications challenging. On the other hand, Multi-Layer Perceptrons (MLPs) offer low-latency solutions but fall short on graph tasks due to their inability to capture graph structural information. This paper introduces a novel method, Prototype-Guided Knowledge Distillation (PGKD), which enables the distillation of GNNs into MLPs while capturing graph structure in an edge-free manner.

PGKD Methodology

PGKD is underpinned by the identification and categorization of graph edges into intra-class and inter-class edges to understand their impact on GNNs. The method utilizes class prototypes—typical embedding vectors representing each class—to distill graph structural knowledge from GNNs to MLPs without requiring graph edge information. Specifically, PGKD includes:

  • Intra-class loss: Encourages nodes of the same class to be closer to their class prototype, capturing homophily in an edge-free setting.
  • Inter-class loss: Aligns the relative distances between different class prototypes as learned by GNNs, thus preserving class separation discovered by the GNN teachers in the distilled MLPs.

Experimental Results

The efficacy of PGKD is validated through experiments on various graph benchmarks, demonstrating not only its robustness and effectiveness but also its superiority over existing methods. PGKD shows marked improvements over GLNN, a baseline edge-free model, across both transductive and inductive settings on multiple datasets including Cora, Citeseer, and Pubmed. Ablation studies reinforce the importance of both intra-class and inter-class losses in achieving desirable performance.

Discussion and Analysis

Further analyses probe into PGKD's robustness against noisy node features, its performance across different inductive split ratios, and the impact of MLP configurations on model outcomes. PGKD consistently outperforms baselines across different levels of noise and configurations, highlighting its flexibility and robustness. Moreover, t-SNE visualizations of node representations offer insights into how PGKD effectively captures graph structure, enabling MLPs to achieve competitive accuracy to their GNN counterparts.

Implications and Future Directions

This paper's introduction of PGKD marks a significant contribution towards bridging the gap between the structural awareness of GNNs and the low-latency advantage of MLPs. The method's edge-free and structure-aware characteristics expand the potential application of MLPs in graph machine learning tasks. Future research could extend this methodology to a broader range of graph tasks beyond node classification and delve into optimizing the prototype generation for enhanced performance and interpretability.

Conclusion

Prototype-Guided Knowledge Distillation (PGKD) emerges as a novel and effective approach for distilling GNNs into MLPs, preserving graph structural information without the need for edge data. Its robustness, coupled with empirical improvements over existing methods, positions PGKD as a promising direction for future research in graph machine learning, particularly in applications where low latency is paramount.