Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Neural Scaling Laws on Graphs (2402.02054v3)

Published 3 Feb 2024 in cs.LG and cs.AI

Abstract: Deep graph models (e.g., graph neural networks and graph transformers) have become important techniques for leveraging knowledge across various types of graphs. Yet, the neural scaling laws on graphs, i.e., how the performance of deep graph models changes with model and dataset sizes, have not been systematically investigated, casting doubts on the feasibility of achieving large graph models. To fill this gap, we benchmark many graph datasets from different tasks and make an attempt to establish the neural scaling laws on graphs from both model and data perspectives. The model size we investigated is up to 100 million parameters, and the dataset size investigated is up to 50 million samples. We first verify the validity of such laws on graphs, establishing proper formulations to describe the scaling behaviors. For model scaling, we identify that despite the parameter numbers, the model depth also plays an important role in affecting the model scaling behaviors, which differs from observations in other domains such as CV and NLP. For data scaling, we suggest that the number of graphs can not effectively measure the graph data volume in scaling law since the sizes of different graphs are highly irregular. Instead, we reform the data scaling law with the number of nodes or edges as the metric to address the irregular graph sizes. We further demonstrate that the reformed law offers a unified view of the data scaling behaviors for various fundamental graph tasks including node classification, link prediction, and graph classification. This work provides valuable insights into neural scaling laws on graphs, which can serve as an important tool for collecting new graph data and developing large graph models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Exploring the limits of large scale pre-training. arXiv preprint arXiv:2110.02095, 2021.
  2. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  3. Scaling laws for generative mixed-modal language models. arXiv preprint arXiv:2301.03728, 2023.
  4. Getting vit in shape: Scaling laws for compute-optimal model design. arXiv preprint arXiv:2305.13035, 2023.
  5. Revisiting neural scaling laws in language and vision. Advances in Neural Information Processing Systems, 35:22300–22312, 2022.
  6. Anonymous. Beyond weisfeiler-lehman: A quantitative framework for GNN expressiveness. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=HSKaGOi7Ar.
  7. Explaining neural scaling laws. arXiv preprint arXiv:2102.06701, 2021.
  8. Data scaling laws in NMT: The effect of noise and architecture. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  1466–1482. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/bansal22b.html.
  9. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
  10. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  11. Structure-aware transformer for graph representation learning. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  3469–3489. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/chen22r.html.
  12. Uncovering neural scaling laws in molecular representation learning. arXiv preprint arXiv:2309.15123, 2023.
  13. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  14. Scaling vision transformers to 22 billion parameters. In International Conference on Machine Learning, pp.  7480–7512. PMLR, 2023.
  15. Scaling laws for multilingual neural machine translation. arXiv preprint arXiv:2302.09650, 2023.
  16. Scaling laws for neural machine translation. arXiv preprint arXiv:2109.07740, 2021.
  17. Neural message passing for quantum chemistry. In International conference on machine learning, pp.  1263–1272. PMLR, 2017.
  18. Data and parameter scaling laws for neural machine translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  5915–5922, 2021.
  19. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
  20. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701, 2020.
  21. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409, 2017.
  22. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
  23. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020.
  24. Ogb-lsc: A large-scale challenge for machine learning on graphs. arXiv preprint arXiv:2103.09430, 2021.
  25. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  26. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  27. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2016.
  28. Tudataset: A collection of benchmark datasets for learning with graphs. arXiv preprint arXiv:2007.08663, 2020.
  29. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
  30. A constructive prediction of the generalization error across scales. arXiv preprint arXiv:1909.12673, 2019.
  31. Scaling laws from the data manifold dimension. The Journal of Machine Learning Research, 23(1):343–376, 2022.
  32. Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
  33. Does gnn pretraining help molecular representation? Advances in Neural Information Processing Systems, 35:12096–12109, 2022.
  34. String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research, 47(D1):D607–D613, 2019.
  35. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  36. Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1):396–413, 2020.
  37. How powerful are graph neural networks? In International Conference on Learning Representations, 2018.
  38. Deep graph kernels. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp.  1365–1374, 2015.
  39. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  12104–12113, June 2022.
  40. Labeling trick: A theory of using graph neural networks for multi-node representation learning. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  9061–9073. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/4be49c79f233b4f4070794825c323733-Paper.pdf.
  41. Large graph models: A perspective. arXiv preprint arXiv:2308.14522, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jingzhe Liu (8 papers)
  2. Haitao Mao (29 papers)
  3. Zhikai Chen (20 papers)
  4. Tong Zhao (121 papers)
  5. Neil Shah (87 papers)
  6. Jiliang Tang (204 papers)
Citations (10)
X Twitter Logo Streamline Icon: https://streamlinehq.com