AdapterGNN: Parameter-Efficient Fine-Tuning Improves Generalization in GNNs (2304.09595v2)
Abstract: Fine-tuning pre-trained models has recently yielded remarkable performance gains in graph neural networks (GNNs). In addition to pre-training techniques, inspired by the latest work in the natural language fields, more recent work has shifted towards applying effective fine-tuning approaches, such as parameter-efficient fine-tuning (PEFT). However, given the substantial differences between GNNs and transformer-based models, applying such approaches directly to GNNs proved to be less effective. In this paper, we present a comprehensive comparison of PEFT techniques for GNNs and propose a novel PEFT method specifically designed for GNNs, called AdapterGNN. AdapterGNN preserves the knowledge of the large pre-trained model and leverages highly expressive adapters for GNNs, which can adapt to downstream tasks effectively with only a few parameters, while also improving the model's generalization ability. Extensive experiments show that AdapterGNN achieves higher performance than other PEFT methods and is the only one consistently surpassing full fine-tuning (outperforming it by 1.6% and 5.7% in the chemistry and biology domains respectively, with only 5% and 4% of its parameters tuned) with lower generalization gaps. Moreover, we empirically show that a larger GNN model can have a worse generalization ability, which differs from the trend observed in large transformer-based models. Building upon this, we provide a theoretical justification for PEFT can improve generalization of GNNs by applying generalization bounds. Our code is available at https://github.com/Lucius-lsr/AdapterGNN.
- Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.
- Stronger generalization bounds for deep nets via a compression approach. In International Conference on Machine Learning, 254–263. PMLR.
- Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32): 15849–15854.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- Adaptformer: Adapting vision transformers for scalable visual recognition. arXiv preprint arXiv:2205.13535.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular Representation Learning. arXiv preprint arXiv:2212.10614.
- Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904.
- Prompt Tuning for Graph Neural Networks. arXiv preprint arXiv:2209.15240.
- Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
- Inductive representation learning on large graphs. Advances in neural information processing systems, 30.
- Pre-trained models: Past, present and future. AI Open, 2: 225–250.
- Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, 1225–1234. PMLR.
- Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366.
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, 2790–2799. PMLR.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13): 3521–3526.
- Over-fitting and model tuning. Applied predictive modeling, 61–92.
- Deep learning. nature, 521(7553): 436–444.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. arXiv preprint arXiv:2205.05638.
- GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks. arXiv preprint arXiv:2302.08043.
- Foundations of machine learning. MIT press.
- Generalization bounds of sgld for non-convex learning: Two theoretical viewpoints. In Conference on Learning Theory, 605–638. PMLR.
- Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12): 124003.
- Using supervised pretraining to improve generalization of neural networks on binary classification problems. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18, 410–425. Springer.
- The graph neural network model. IEEE transactions on neural networks, 20(1): 61–80.
- Understanding machine learning: From theory to algorithms. Cambridge university press.
- ZINC 15–ligand discovery for everyone. Journal of chemical information and modeling, 55(11): 2324–2337.
- Gppt: Graph pre-training and prompt tuning to generalize graph neural networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1717–1727.
- On the depth of deep neural networks: A theoretical view. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30.
- Graph attention networks. stat, 1050(20): 10–48550.
- Advanced graph and sequence neural networks for molecular property prediction and drug discovery. Bioinformatics, 38(9): 2579–2586.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- A Survey of Graph Prompting Methods: Techniques, Applications, and Challenges. arXiv preprint arXiv:2303.07275.
- A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1): 4–24.
- Simgrace: A simple framework for graph contrastive learning without data augmentation. In Proceedings of the ACM Web Conference 2022, 1070–1079.
- A survey of pretraining on graphs: Taxonomy, methods, and applications. arXiv preprint arXiv:2202.07893.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826.
- Graph contrastive learning with augmentations. Advances in neural information processing systems, 33: 5812–5823.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3): 107–115.
- Meta-gnn: On few-shot node classification in graph meta-learning. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2357–2360.
- Auto-gnn: Neural architecture search of graph neural networks. arXiv preprint arXiv:1909.03184.