Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks (2404.14941v1)

Published 23 Apr 2024 in cs.LG and cs.AI

Abstract: Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that aim at extracting useful information about pre-training tasks, may not extract all useful information about the downstream task. In this paper, we reexamine the pre-training process within traditional pre-training and fine-tuning frameworks from the perspective of Information Bottleneck (IB) and confirm that the forgetting phenomenon in pre-training phase may cause detrimental effects on downstream tasks. Therefore, we propose a novel \underline{D}elayed \underline{B}ottlenecking \underline{P}re-training (DBP) framework which maintains as much as possible mutual information between latent representations and training data during pre-training phase by suppressing the compression operation and delays the compression operation to fine-tuning phase to make sure the compression can be guided with labeled fine-tuning data and downstream tasks. To achieve this, we design two information control objectives that can be directly optimized and further integrate them into the actual model design. Extensive experiments on both chemistry and biology domains demonstrate the effectiveness of DBP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. S. Bhagat, G. Cormode, and S. Muthukrishnan, “Node classification in social networks,” in Social network data analytics, 2011.
  2. J. He, H. Liu, Y. Zheng, S. Tang, W. He, and X. Du, “Bi-labeled lda: Inferring interest tags for non-famous users in social network,” Data Science and Engineering, 2020.
  3. Y. Wang, P. Li, C. Bai, and J. Leskovec, “Tedic: Neural modeling of behavioral patterns in dynamic social interaction networks,” in Proc. of WWW, 2021.
  4. F. Zhang, J. Zhai, B. He, S. Zhang, and W. Chen, “Understanding co-running behaviors on integrated cpu/gpu architectures,” IEEE Transactions on Parallel and Distributed Systems, 2016.
  5. S. Zhang, H. Yin, T. Chen, Q. V. N. Hung, Z. Huang, and L. Cui, “Gcn-based user representation learning for unifying robust recommendation and fraudster detection,” in Proc. of SIGIR, 2020.
  6. J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in Proc. of ICML, 2017.
  7. S. Liu, M. F. Demirel, and Y. Liang, “N-gram graph: Simple unsupervised representation for graphs, with applications to molecules,” Proc. of NeurIPS, 2019.
  8. Z. Wu, J. Wang, H. Du, D. Jiang, Y. Kang, D. Li, P. Pan, Y. Deng, D. Cao, C.-Y. Hsieh et al., “Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking,” Nature Communications, vol. 14, no. 1, p. 2585, 2023.
  9. P. Reiser, M. Neubert, A. Eberhard, L. Torresi, C. Zhou, C. Shao, H. Metni, C. van Hoesel, H. Schopmans, T. Sommer et al., “Graph neural networks for materials science and chemistry,” Communications Materials, vol. 3, no. 1, p. 93, 2022.
  10. Z. Li, K. Meidani, P. Yadav, and A. Barati Farimani, “Graph neural networks accelerated molecular dynamics,” The Journal of Chemical Physics, vol. 156, no. 14, p. 144103, 2022.
  11. H. Wang, F. Zhang, M. Zhang, J. Leskovec, M. Zhao, W. Li, and Z. Wang, “Knowledge-aware graph neural networks with label smoothness regularization for recommender systems,” in Proc. of KDD, 2019.
  12. R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale recommender systems,” in Proc. of KDD, 2018.
  13. L. Yang, S. Wang, Y. Tao, J. Sun, X. Liu, P. S. Yu, and T. Wang, “Dgrec: Graph neural network for recommendation with diversified embedding generation,” in Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023, pp. 661–669.
  14. J. Chang, C. Gao, Y. Zheng, Y. Hui, Y. Niu, Y. Song, D. Jin, and Y. Li, “Sequential recommendation with graph neural networks,” in Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, 2021, pp. 378–387.
  15. Y. Hao, J. Ma, P. Zhao, G. Liu, X. Xian, L. Zhao, and V. S. Sheng, “Multi-dimensional graph neural network for sequential recommendation,” Pattern Recognition, vol. 139, p. 109504, 2023.
  16. C. Gao, X. Wang, X. He, and Y. Li, “Graph neural networks for recommender system,” in Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 2022, pp. 1623–1625.
  17. C. Shi, M. Xu, H. Guo, M. Zhang, and J. Tang, “A graph to graphs framework for retrosynthesis prediction,” in Proc. of ICML, 2020.
  18. C. Shi, M. Xu, Z. Zhu, W. Zhang, M. Zhang, and J. Tang, “Graphaf: a flow-based autoregressive model for molecular graph generation,” in Proc. of ICLR, 2019.
  19. J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec, “Graph convolutional policy network for goal-directed molecular graph generation,” Proc. of NeurIPS, 2018.
  20. S. Ji, S. Pan, E. Cambria, P. Marttinen, and S. Y. Philip, “A survey on knowledge graphs: Representation, acquisition, and applications,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
  21. P. Wang, C. Ge, Z. Zhou, X. Wang, Y. Li, and Y. Wang, “Joint gated co-attention based multi-modal networks for subregion house price prediction,” IEEE Transactions on Knowledge & Data Engineering, vol. 35, no. 02, pp. 1667–1680, 2023.
  22. Z. Zhao, P. Wang, H. Wen, Y. Zhang, Z. Zhou, and Y. Wang, “A twist for graph classification: Optimizing causal information flow in graph neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 15, 2024, pp. 17 042–17 050.
  23. Z. Hu, Y. Dong, K. Wang, K.-W. Chang, and Y. Sun, “Gpt-gnn: Generative pre-training of graph neural networks,” in Proc. of KDD, 2020.
  24. Z. Hou, X. Liu, Y. Cen, Y. Dong, H. Yang, C. Wang, and J. Tang, “Graphmae: Self-supervised masked graph autoencoders,” in Proc. of KDD, 2022.
  25. W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec, “Strategies for pre-training graph neural networks,” in Proc. of ICLR, 2020.
  26. Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph contrastive learning with augmentations,” Advances in neural information processing systems, vol. 33, pp. 5812–5823, 2020.
  27. M. Xu, H. Wang, B. Ni, H. Guo, and J. Tang, “Self-supervised graph-level representation learning with local and global structure,” in Proc. of ICML, 2021.
  28. F.-Y. Sun, J. Hoffman, V. Verma, and J. Tang, “Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization,” in Proc. of ICLR, 2020.
  29. Y. You, T. Chen, Y. Shen, and Z. Wang, “Graph contrastive learning automated,” in Proc. of ICML, 2021.
  30. L. Wu, H. Lin, C. Tan, Z. Gao, and S. Z. Li, “Self-supervised learning on graphs: Contrastive, generative, or predictive,” IEEE Transactions on Knowledge and Data Engineering, 2021.
  31. M. Tran, S. J. Wagner, M. Boxberg, and T. Peng, “S5cl: Unifying fully-supervised, self-supervised, and semi-supervised learning through hierarchical contrastive learning,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part II.   Springer, 2022, pp. 99–108.
  32. Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Graph contrastive learning with adaptive augmentation,” in Proceedings of the Web Conference 2021, 2021, pp. 2069–2080.
  33. V. Verma, M. Qu, K. Kawaguchi, A. Lamb, Y. Bengio, J. Kannala, and J. Tang, “Graphmix: Improved training of gnns for semi-supervised learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 35, no. 11, 2021, pp. 10 024–10 032.
  34. Z. Liu, X. Yu, Y. Fang, and X. Zhang, “Graphprompt: Unifying pre-training and downstream tasks for graph neural networks,” in Proceedings of the ACM Web Conference 2023, 2023, pp. 417–428.
  35. C. Li and Z. Qiu, “Targeted bert pre-training and fine-tuning approach for entity relation extraction,” in Data Science: 7th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2021, Taiyuan, China, September 17–20, 2021, Proceedings, Part II 7.   Springer, 2021, pp. 116–125.
  36. D. Wan and M. Bansal, “Factpegasus: Factuality-aware pre-training and fine-tuning for abstractive summarization,” arXiv preprint arXiv:2205.07830, 2022.
  37. M. C. Anderson and S. B. Floresco, “Prefrontal-hippocampal interactions supporting the extinction of emotional memories: the retrieval stopping model,” Neuropsychopharmacology, vol. 47, no. 1, pp. 180–195, 2022.
  38. T. Gruber, L. Luncz, J. Mörchen, C. Schuppli, R. L. Kendal, and K. Hockings, “Cultural change in animals: a flexible behavioural adaptation to human disturbance,” Palgrave Communications, vol. 5, no. 1, 2019.
  39. T. Kitazono, S. Hara-Kuge, O. Matsuda, A. Inoue, M. Fujiwara, and T. Ishihara, “Multiple signaling pathways coordinately regulate forgetting of olfactory adaptation through control of sensory responses in caenorhabditis elegans,” Journal of Neuroscience, vol. 37, no. 42, pp. 10 240–10 251, 2017.
  40. L. Gravitz, “The importance of forgetting,” Nature, 2019.
  41. A. Achille, M. Rovere, and S. Soatto, “Critical learning periods in deep networks,” in Proc. of ICLR, 2018.
  42. R. Shwartz-Ziv and N. Tishby, “Opening the black box of deep neural networks via information (2017),” arXiv preprint arXiv:1703.00810, 2017.
  43. J. Li, H. Xu, S.-Y. Sun, S. Liu, N. Li, Q. Li, H. Liu, and Z. Li, “Enhanced spiking neural network with forgetting phenomenon based on electronic synaptic devices,” Neurocomputing, vol. 408, pp. 21–30, 2020.
  44. J. Peng, X. Sun, M. Deng, C. Tao, B. Tang, W. Li, G. Wu, Y. Liu, T. Lin, H. Li et al., “Learning by active forgetting for neural networks,” arXiv preprint arXiv:2111.10831, 2021.
  45. A. Cossu, T. Tuytelaars, A. Carta, L. Passaro, V. Lomonaco, and D. Bacciu, “Continual pre-training mitigates forgetting in language and vision,” arXiv preprint arXiv:2205.09357, 2022.
  46. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., “Codebert: A pre-trained model for programming and natural languages,” arXiv preprint arXiv:2002.08155, 2020.
  47. P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax.” ICLR (Poster), 2019.
  48. S. Liu, H. Wang, W. Liu, J. Lasenby, H. Guo, and J. Tang, “Pre-training molecular graph representation with 3d geometry,” arXiv preprint arXiv:2110.07728, 2021.
  49. A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in Proc. of ICLR, 2017.
  50. T. Wu, H. Ren, P. Li, and J. Leskovec, “Graph information bottleneck,” Proc. of NeurIPS, 2020.
  51. J. Yu, T. Xu, Y. Rong, Y. Bian, J. Huang, and R. He, “Graph information bottleneck for subgraph recognition,” in Proc. of ICLR, 2020.
  52. Z. Peng, W. Huang, M. Luo, Q. Zheng, Y. Rong, T. Xu, and J. Huang, “Graph representation learning via graphical mutual information maximization,” in Proc. of WWW, 2020.
  53. T. Sterling and J. J. Irwin, “Zinc 15–ligand discovery for everyone,” Journal of chemical information and modeling, 2015.
  54. Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande, “Moleculenet: a benchmark for molecular machine learning,” Chemical science, 2018.
  55. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR (Poster), 2015.
  56. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. of ICLR, 2017.
  57. W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Proc. of NeurIPS, 2017.
  58. P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in Proc. of ICLR, 2018.
  59. K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in Proc. of ICLR, 2019.
  60. N. Liu, S. Jian, D. Li, and H. Xu, “Unsupervised hierarchical graph pooling via substructure-sensitive mutual information maximization,” in Proc. of CIKM, 2022.
  61. Y. You, T. Chen, Z. Wang, and Y. Shen, “When does self-supervision help graph convolutional networks?” in Proc. of ICML, 2020.
  62. Z. Hou, X. Liu, Y. Cen, Y. Dong, H. Yang, C. Wang, and J. Tang, “Graphmae: Self-supervised masked graph autoencoders,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 594–604.
  63. Q. Tan, N. Liu, X. Huang, S.-H. Choi, L. Li, R. Chen, and X. Hu, “S2gae: self-supervised graph autoencoders are generalizable learners with graph masking,” in Proceedings of the sixteenth ACM international conference on web search and data mining, 2023, pp. 787–795.
  64. Y. Lu, X. Jiang, Y. Fang, and C. Shi, “Learning to pre-train graph neural networks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 35, no. 5, 2021, pp. 4276–4284.
  65. M. Sun, K. Zhou, X. He, Y. Wang, and X. Wang, “Gppt: Graph pre-training and prompt tuning to generalize graph neural networks,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 1717–1727.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets