Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdapterGNN: Parameter-Efficient Fine-Tuning Improves Generalization in GNNs (2304.09595v2)

Published 19 Apr 2023 in cs.LG

Abstract: Fine-tuning pre-trained models has recently yielded remarkable performance gains in graph neural networks (GNNs). In addition to pre-training techniques, inspired by the latest work in the natural language fields, more recent work has shifted towards applying effective fine-tuning approaches, such as parameter-efficient fine-tuning (PEFT). However, given the substantial differences between GNNs and transformer-based models, applying such approaches directly to GNNs proved to be less effective. In this paper, we present a comprehensive comparison of PEFT techniques for GNNs and propose a novel PEFT method specifically designed for GNNs, called AdapterGNN. AdapterGNN preserves the knowledge of the large pre-trained model and leverages highly expressive adapters for GNNs, which can adapt to downstream tasks effectively with only a few parameters, while also improving the model's generalization ability. Extensive experiments show that AdapterGNN achieves higher performance than other PEFT methods and is the only one consistently surpassing full fine-tuning (outperforming it by 1.6% and 5.7% in the chemistry and biology domains respectively, with only 5% and 4% of its parameters tuned) with lower generalization gaps. Moreover, we empirically show that a larger GNN model can have a worse generalization ability, which differs from the trend observed in large transformer-based models. Building upon this, we provide a theoretical justification for PEFT can improve generalization of GNNs by applying generalization bounds. Our code is available at https://github.com/Lucius-lsr/AdapterGNN.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.
  2. Stronger generalization bounds for deep nets via a compression approach. In International Conference on Machine Learning, 254–263. PMLR.
  3. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32): 15849–15854.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  5. Adaptformer: Adapting vision transformers for scalable visual recognition. arXiv preprint arXiv:2205.13535.
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular Representation Learning. arXiv preprint arXiv:2212.10614.
  8. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904.
  9. Prompt Tuning for Graph Neural Networks. arXiv preprint arXiv:2209.15240.
  10. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
  11. Inductive representation learning on large graphs. Advances in neural information processing systems, 30.
  12. Pre-trained models: Past, present and future. AI Open, 2: 225–250.
  13. Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, 1225–1234. PMLR.
  14. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366.
  15. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, 2790–2799. PMLR.
  16. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  17. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265.
  18. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13): 3521–3526.
  19. Over-fitting and model tuning. Applied predictive modeling, 61–92.
  20. Deep learning. nature, 521(7553): 436–444.
  21. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
  22. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
  23. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. arXiv preprint arXiv:2205.05638.
  24. GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks. arXiv preprint arXiv:2302.08043.
  25. Foundations of machine learning. MIT press.
  26. Generalization bounds of sgld for non-convex learning: Two theoretical viewpoints. In Conference on Learning Theory, 605–638. PMLR.
  27. Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12): 124003.
  28. Using supervised pretraining to improve generalization of neural networks on binary classification problems. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18, 410–425. Springer.
  29. The graph neural network model. IEEE transactions on neural networks, 20(1): 61–80.
  30. Understanding machine learning: From theory to algorithms. Cambridge university press.
  31. ZINC 15–ligand discovery for everyone. Journal of chemical information and modeling, 55(11): 2324–2337.
  32. Gppt: Graph pre-training and prompt tuning to generalize graph neural networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1717–1727.
  33. On the depth of deep neural networks: A theoretical view. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30.
  34. Graph attention networks. stat, 1050(20): 10–48550.
  35. Advanced graph and sequence neural networks for molecular property prediction and drug discovery. Bioinformatics, 38(9): 2579–2586.
  36. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  37. A Survey of Graph Prompting Methods: Techniques, Applications, and Challenges. arXiv preprint arXiv:2303.07275.
  38. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1): 4–24.
  39. Simgrace: A simple framework for graph contrastive learning without data augmentation. In Proceedings of the ACM Web Conference 2022, 1070–1079.
  40. A survey of pretraining on graphs: Taxonomy, methods, and applications. arXiv preprint arXiv:2202.07893.
  41. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826.
  42. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33: 5812–5823.
  43. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199.
  44. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3): 107–115.
  45. Meta-gnn: On few-shot node classification in graph meta-learning. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2357–2360.
  46. Auto-gnn: Neural architecture search of graph neural networks. arXiv preprint arXiv:1909.03184.
Citations (8)

Summary

We haven't generated a summary for this paper yet.