Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GraphFM: A Scalable Framework for Multi-Graph Pretraining (2407.11907v1)

Published 16 Jul 2024 in cs.LG and cs.SI

Abstract: Graph neural networks are typically trained on individual datasets, often requiring highly specialized models and extensive hyperparameter tuning. This dataset-specific approach arises because each graph dataset often has unique node features and diverse connectivity structures, making it difficult to build a generalist model. To address these challenges, we introduce a scalable multi-graph multi-task pretraining approach specifically tailored for node classification tasks across diverse graph datasets from different domains. Our method, Graph Foundation Model (GraphFM), leverages a Perceiver-based encoder that employs learned latent tokens to compress domain-specific features into a common latent space. This approach enhances the model's ability to generalize across different graphs and allows for scaling across diverse data. We demonstrate the efficacy of our approach by training a model on 152 different graph datasets comprising over 7.4 million nodes and 189 million edges, establishing the first set of scaling laws for multi-graph pretraining on datasets spanning many domains (e.g., molecules, citation and product graphs). Our results show that pretraining on a diverse array of real and synthetic graphs improves the model's adaptability and stability, while performing competitively with state-of-the-art specialist models. This work illustrates that multi-graph pretraining can significantly reduce the burden imposed by the current graph training paradigm, unlocking new capabilities for the field of graph neural networks by creating a single generalist model that performs competitively across a wide range of datasets and tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. J. Topping, F. Di Giovanni, B. P. Chamberlain, X. Dong, and M. M. Bronstein, “Understanding over-squashing and bottlenecks on graphs via curvature,” arXiv preprint arXiv:2111.14522, 2021.
  2. Y. Yan, M. Hashemi, K. Swersky, Y. Yang, and D. Koutra, “Two sides of the same coin: Heterophily and oversmoothing in graph convolutional neural networks,” in 2022 IEEE International Conference on Data Mining (ICDM), pp. 1287–1292, IEEE, 2022.
  3. J. Zhu, Y. Yan, L. Zhao, M. Heimann, L. Akoglu, and D. Koutra, “Beyond homophily in graph neural networks: Current limitations and effective designs,” Advances in neural information processing systems, vol. 33, pp. 7793–7804, 2020.
  4. L. Guo, Q. Zhang, and H. Chen, “Unleashing the power of transformer for graphs,” arXiv preprint arXiv:2202.10581, 2022.
  5. S. Abu-El-Haija, B. Perozzi, A. Kapoor, N. Alipourfard, K. Lerman, H. Harutyunyan, G. Ver Steeg, and A. Galstyan, “Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing,” in international conference on machine learning, pp. 21–29, PMLR, 2019.
  6. N. Keriven, “Not too little, not too much: a theoretical analysis of graph (over) smoothing,” Advances in Neural Information Processing Systems, vol. 35, pp. 2268–2281, 2022.
  7. W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,” Advances in neural information processing systems, vol. 33, pp. 22118–22133, 2020.
  8. H. Mao, Z. Chen, W. Tang, J. Zhao, Y. Ma, T. Zhao, N. Shah, M. Galkin, and J. Tang, “Graph foundation models,” 2024.
  9. M. Galkin, X. Yuan, H. Mostafa, J. Tang, and Z. Zhu, “Towards foundation models for knowledge graph reasoning,” arXiv preprint arXiv:2310.04562, 2023.
  10. B. Ibarz, V. Kurin, G. Papamakarios, K. Nikiforou, M. Bennani, R. Csordás, A. J. Dudzik, M. Bošnjak, A. Vitvitskyi, Y. Rubanova, A. Deac, B. Bevilacqua, Y. Ganin, C. Blundell, and P. Veličković, “A generalist neural algorithmic learner,” in Proceedings of the First Learning on Graphs Conference (B. Rieck and R. Pascanu, eds.), vol. 198 of Proceedings of Machine Learning Research, pp. 2:1–2:23, PMLR, 09–12 Dec 2022.
  11. L. Müller, M. Galkin, C. Morris, and L. Rampášek, “Attending to graph transformers,” arXiv preprint arXiv:2302.04181, 2023.
  12. A. Jaegle, F. Gimeno, A. Brock, O. Vinyals, A. Zisserman, and J. Carreira, “Perceiver: General perception with iterative attention,” in International conference on machine learning, pp. 4651–4664, PMLR, 2021.
  13. T. Dao, “Flashattention-2: Faster attention with better parallelism and work partitioning,” arXiv preprint arXiv:2307.08691, 2023.
  14. D. Lim, J. Robinson, L. Zhao, T. Smidt, S. Sra, H. Maron, and S. Jegelka, “Sign and basis invariant networks for spectral graph representation learning,” arXiv preprint arXiv:2202.13013, 2022.
  15. A. Jaegle, S. Borgeaud, J.-B. Alayrac, C. Doersch, C. Ionescu, D. Ding, S. Koppula, D. Zoran, A. Brock, E. Shelhamer, et al., “Perceiver io: A general architecture for structured inputs & outputs,” arXiv preprint arXiv:2107.14795, 2021.
  16. M. Fey and J. E. Lenssen, “Fast graph representation learning with PyTorch Geometric,” in ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
  17. R. Rossi and N. Ahmed, “Networkrepository: An interactive data repository with multi-scale visual analytics,” 2014.
  18. A. Tsitsulin, B. Rozemberczki, J. Palowitch, and B. Perozzi, “Synthetic graph generation to benchmark graph learning,” arXiv preprint arXiv:2204.01376, 2022.
  19. R. Liu, J. Wei, F. Liu, C. Si, Y. Zhang, J. Rao, S. Zheng, D. Peng, D. Yang, D. Zhou, et al., “Best practices and lessons learned on synthetic data for language models,” arXiv preprint arXiv:2404.07503, 2024.
  20. Springer, 2021.
  21. A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J. Hsu, and K. Wang, “An overview of microsoft academic service (mas) and applications,” in Proceedings of the 24th international conference on world wide web, pp. 243–246, 2015.
  22. B. Rozemberczki, C. Allen, and R. Sarkar, “Multi-scale attributed node embedding,” Journal of Complex Networks, vol. 9, no. 2, p. cnab014, 2021.
  23. Y. You, J. Li, S. Reddi, J. Hseu, S. Kumar, S. Bhojanapalli, X. Song, J. Demmel, K. Keutzer, and C.-J. Hsieh, “Large batch optimization for deep learning: Training bert in 76 minutes,” arXiv preprint arXiv:1904.00962, 2019.
  24. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  25. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  26. Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” arXiv preprint arXiv:1511.05493, 2015.
  27. J. Klicpera, A. Bojchevski, and S. Günnemann, “Combining neural networks with personalized pagerank for classification on graphs,” in International conference on learning representations, 2019.
  28. M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li, “Simple and deep graph convolutional networks,” in International conference on machine learning, pp. 1725–1735, PMLR, 2020.
  29. P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, et al., “Graph attention networks,” stat, vol. 1050, no. 20, pp. 10–48550, 2017.
  30. S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?,” arXiv preprint arXiv:2105.14491, 2021.
  31. D. Kim and A. Oh, “How to find your friendly neighborhood: Graph attention design with self-supervision,” arXiv preprint arXiv:2204.04879, 2022.
  32. D. Bo, X. Wang, C. Shi, and H. Shen, “Beyond low-frequency information in graph convolutional networks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 35, pp. 3950–3957, 2021.
  33. E. Chien, J. Peng, P. Li, and O. Milenkovic, “Adaptive universal generalized pagerank graph neural network,” arXiv preprint arXiv:2006.07988, 2020.
  34. D. Kreuzer, D. Beaini, W. Hamilton, V. Létourneau, and P. Tossou, “Rethinking graph transformers with spectral attention,” Advances in Neural Information Processing Systems, vol. 34, pp. 21618–21629, 2021.
  35. C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, and T.-Y. Liu, “Do transformers really perform badly for graph representation?,” Advances in Neural Information Processing Systems, vol. 34, pp. 28877–28888, 2021.
  36. C. Chen, C. Tao, and N. Wong, “Litegt: Efficient and lightweight graph transformers,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 161–170, 2021.
  37. Y. Shi, Z. Huang, S. Feng, H. Zhong, W. Wang, and Y. Sun, “Masked label prediction: Unified message passing model for semi-supervised classification,” arXiv preprint arXiv:2009.03509, 2020.
  38. J. Chen, K. Gao, G. Li, and K. He, “Nagphormer: A tokenized graph transformer for node classification in large graphs,” arXiv preprint arXiv:2206.04910, 2022.
  39. N. K. Ahmed, R. A. Rossi, R. Zhou, J. B. Lee, X. Kong, T. L. Willke, and H. Eldardiry, “Inductive representation learning in large attributed graphs,” arXiv preprint arXiv:1710.09471, 2017.
  40. M. Azabou, V. Ganesh, S. Thakoor, C.-H. Lin, L. Sathidevi, R. Liu, M. Valko, P. Veličković, and E. L. Dyer, “Half-hop: A graph upsampling approach for slowing down message passing,” in International Conference on Machine Learning, pp. 1341–1360, PMLR, 2023.
  41. C. Liu, Y. Zhan, X. Ma, L. Ding, D. Tao, J. Wu, and W. Hu, “Gapformer: Graph transformer with graph pooling for node classification,” in Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23), pp. 2196–2205, 2023.
  42. D. Beaini, S. Huang, J. A. Cunha, G. Moisescu-Pareja, O. Dymov, S. Maddrell-Mander, C. McLean, F. Wenkel, L. Müller, J. H. Mohamud, et al., “Towards foundational models for molecular learning on large-scale multi-task datasets,” arXiv preprint arXiv:2310.04292, 2023.
  43. H. Pei, B. Wei, K. C.-C. Chang, Y. Lei, and B. Yang, “Geom-gcn: Geometric graph convolutional networks,” arXiv preprint arXiv:2002.05287, 2020.
  44. V. P. Dwivedi and X. Bresson, “A generalization of transformer networks to graphs,” arXiv preprint arXiv:2012.09699, 2020.
  45. Z. Wu, P. Jain, M. Wright, A. Mirhoseini, J. E. Gonzalez, and I. Stoica, “Representing long-range context for graph neural networks with global attention,” Advances in Neural Information Processing Systems, vol. 34, pp. 13266–13279, 2021.
  46. L. Rampášek, M. Galkin, V. P. Dwivedi, A. T. Luu, G. Wolf, and D. Beaini, “Recipe for a general, powerful, scalable graph transformer,” Advances in Neural Information Processing Systems, vol. 35, pp. 14501–14515, 2022.
  47. D. Chen, L. O’Bray, and K. Borgwardt, “Structure-aware transformer for graph representation learning,” in International Conference on Machine Learning, pp. 3469–3489, PMLR, 2022.
  48. Z. Zhang, Q. Liu, Q. Hu, and C.-K. Lee, “Hierarchical graph transformer with adaptive node sampling,” Advances in Neural Information Processing Systems, vol. 35, pp. 21171–21183, 2022.
  49. H. Shirzad, A. Velingker, B. Venkatachalam, D. J. Sutherland, and A. K. Sinop, “Exphormer: Sparse transformers for graphs,” in International Conference on Machine Learning, pp. 31613–31632, PMLR, 2023.
  50. A. Deac, M. Lackenby, and P. Veličković, “Expander graph propagation,” in NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations, 2022.
  51. L. Cao, H. Deng, Y. Yang, C. Wang, and L. Chen, “Graph-skeleton:  1% nodes are sufficient to represent billion-scale graph,” in Proceedings of the ACM on Web Conference 2024, WWW ’24, p. 570–581, 2024.
  52. A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving language understanding by generative pre-training,” 2018.
  53. M. Dehghani, J. Djolonga, B. Mustafa, P. Padlewski, J. Heek, J. Gilmer, A. P. Steiner, M. Caron, R. Geirhos, I. Alabdulmohsin, R. Jenatton, L. Beyer, M. Tschannen, A. Arnab, X. Wang, C. Riquelme Ruiz, M. Minderer, J. Puigcerver, U. Evci, M. Kumar, S. V. Steenkiste, G. F. Elsayed, A. Mahendran, F. Yu, A. Oliver, F. Huot, J. Bastings, M. Collier, A. A. Gritsenko, V. Birodkar, C. N. Vasconcelos, Y. Tay, T. Mensink, A. Kolesnikov, F. Pavetic, D. Tran, T. Kipf, M. Lucic, X. Zhai, D. Keysers, J. J. Harmsen, and N. Houlsby, “Scaling vision transformers to 22 billion parameters,” in Proceedings of the 40th International Conference on Machine Learning (A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, eds.), vol. 202 of Proceedings of Machine Learning Research, pp. 7480–7512, PMLR, 23–29 Jul 2023.
  54. A. Das, W. Kong, R. Sen, and Y. Zhou, “A decoder-only foundation model for time-series forecasting,” arXiv preprint arXiv:2310.10688, 2023.
  55. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  56. P. Veličković, A. P. Badia, D. Budden, R. Pascanu, A. Banino, M. Dashevskiy, R. Hadsell, and C. Blundell, “The clrs algorithmic reasoning benchmark,” arXiv preprint arXiv:2205.15659, 2022.
  57. J. Liu, C. Yang, Z. Lu, J. Chen, Y. Li, M. Zhang, T. Bai, Y. Fang, L. Sun, P. S. Yu, et al., “Towards graph foundation models: A survey and beyond,” arXiv preprint arXiv:2310.11829, 2023.
  58. B. Lefaudeux, F. Massa, D. Liskovich, W. Xiong, V. Caggiano, S. Naren, M. Xu, J. Hu, M. Tintore, S. Zhang, P. Labatut, D. Haziza, L. Wehrstedt, J. Reizenstein, and G. Sizov, “xformers: A modular and hackable transformer modelling library.” https://github.com/facebookresearch/xformers, 2022.
  59. J. Palowitch, A. Tsitsulin, B. Mayer, and B. Perozzi, “Graphworld: Fake graphs bring real insights for gnns,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3691–3701, 2022.
  60. P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels: First steps,” Social networks, vol. 5, no. 2, pp. 109–137, 1983.
  61. J. McAuley, C. Targett, Q. Shi, and A. Van Den Hengel, “Image-based recommendations on styles and substitutes,” in Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp. 43–52, 2015.
  62. Y. Luo, G. Luo, K. Yan, and A. Chen, “Inferring from references with differences for semi-supervised node classification on graphs,” Mathematics, vol. 10, no. 8, p. 1262, 2022.
  63. V. T. Hoang, O. Lee, et al., “Mitigating degree biases in message passing mechanism by utilizing community structures,” arXiv preprint arXiv:2312.16788, 2023.
  64. H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V. Prasanna, “Graphsaint: Graph sampling based inductive learning method,” arXiv preprint arXiv:1907.04931, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Divyansha Lachi (1 paper)
  2. Mehdi Azabou (15 papers)
  3. Vinam Arora (3 papers)
  4. Eva Dyer (6 papers)
Citations (2)