Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attention is all you need for boosting graph convolutional neural network (2403.15419v1)

Published 10 Mar 2024 in cs.LG, cs.GR, and cs.SI

Abstract: Graph Convolutional Neural Networks (GCNs) possess strong capabilities for processing graph data in non-grid domains. They can capture the topological logical structure and node features in graphs and integrate them into nodes' final representations. GCNs have been extensively studied in various fields, such as recommendation systems, social networks, and protein molecular structures. With the increasing application of graph neural networks, research has focused on improving their performance while compressing their size. In this work, a plug-in module named Graph Knowledge Enhancement and Distillation Module (GKEDM) is proposed. GKEDM can enhance node representations and improve the performance of GCNs by extracting and aggregating graph information via multi-head attention mechanism. Furthermore, GKEDM can serve as an auxiliary transferor for knowledge distillation. With a specially designed attention distillation method, GKEDM can distill the knowledge of large teacher models into high-performance and compact student models. Experiments on multiple datasets demonstrate that GKEDM can significantly improve the performance of various GCNs with minimal overhead. Furthermore, it can efficiently transfer distilled knowledge from large teacher networks to small student networks via attention distillation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. G. Li, M. Müller, B. Ghanem, and V. Koltun, “Training graph neural networks with 1000 layers,” Jun 2021.
  2. C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, and T.-Y. Liu, “Do transformers really perform bad for graph representation,” Jun 2021.
  3. G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” Mar 2015.
  4. F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” Jul 2019.
  5. W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jan 2020. [Online]. Available: http://dx.doi.org/10.1109/cvpr.2019.00409
  6. Y. Yang, J. Qiu, M. Song, D. Tao, and X. Wang, “Distilling knowledge from graph convolutional networks,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Aug 2020. [Online]. Available: http://dx.doi.org/10.1109/cvpr42600.2020.00710
  7. C. K. Joshi, F. Liu, X. Xun, J. Lin, and C. S. Foo, “On representation knowledge distillation for graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, p. 1–12, Dec 2022. [Online]. Available: http://dx.doi.org/10.1109/tnnls.2022.3223018
  8. H. He, J. Wang, Z. Zhang, and F. Wu, “Compressing deep graph neural networks via adversarial knowledge distillation,” May 2022.
  9. T. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” Sep 2016.
  10. W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Jun 2017.
  11. M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li, “Simple and deep graph convolutional networks,” Jul 2020.
  12. F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model cnns,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nov 2017. [Online]. Available: http://dx.doi.org/10.1109/cvpr.2017.576
  13. Y. Tian, D. Krishnan, and P. Isola, “Contrastive representation distillation,” Oct 2019.
  14. A. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” Jul 2018.
  15. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Jun 2017.
  16. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” Oct 2020.
  17. V. Dwivedi, C. Joshi, T. Laurent, Y. Bengio, and X. Bresson, “Benchmarking graph neural networks,” Mar 2020.
  18. W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers,” Feb 2020.
  19. M. Zitnik and J. Leskovec, “Predicting multicellular function through multi-layer tissue networks.” Bioinformatics, vol. 33, no. 14, p. i190–i198, Apr 2017. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btx252
  20. H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V. Prasanna, “Graphsaint: Graph sampling based inductive learning method,” Jul 2019.
  21. O. Shchur, M. Mumme, A. Bojchevski, and S. Günnemann, “Pitfalls of graph neural network evaluation,” Relational Representation Learning Workshop, NeurIPS 2018, 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com