Node Duplication Improves Cold-start Link Prediction (2402.09711v1)
Abstract: Graph Neural Networks (GNNs) are prominent in graph machine learning and have shown state-of-the-art performance in Link Prediction (LP) tasks. Nonetheless, recent studies show that GNNs struggle to produce good results on low-degree nodes despite their overall strong performance. In practical applications of LP, like recommendation systems, improving performance on low-degree nodes is critical, as it amounts to tackling the cold-start problem of improving the experiences of users with few observed interactions. In this paper, we investigate improving GNNs' LP performance on low-degree nodes while preserving their performance on high-degree nodes and propose a simple yet surprisingly effective augmentation technique called NodeDup. Specifically, NodeDup duplicates low-degree nodes and creates links between nodes and their own duplicates before following the standard supervised LP training scheme. By leveraging a ''multi-view'' perspective for low-degree nodes, NodeDup shows significant LP performance improvements on low-degree nodes without compromising any performance on high-degree nodes. Additionally, as a plug-and-play augmentation module, NodeDup can be easily applied to existing GNNs with very light computational cost. Extensive experiments show that NodeDup achieves 38.49%, 13.34%, and 6.76% improvements on isolated, low-degree, and warm nodes, respectively, on average across all datasets compared to GNNs and state-of-the-art cold-start methods.
- Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. arXiv preprint arXiv:2012.09816, 2020.
- Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017.
- A multi-scale approach for graph link prediction. In Proceedings of the AAAI conference on artificial intelligence, 2020.
- Transgcn: Coupling transformation assumptions with graph convolutional networks for link prediction. In Proceedings of the 10th international conference on knowledge capture, pp. 131–138, 2019.
- Line graph neural networks for link prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Graph neural networks for link prediction with subgraph sketching. arXiv preprint arXiv:2209.15486, 2022.
- Esam: Discriminative domain adaptation with non-displayed items to improve long-tail performance. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579–588, 2020.
- Node feature extraction by self-supervised multi-scale neighborhood prediction. arXiv preprint arXiv:2111.00064, 2021.
- Power-law distributions in empirical data. SIAM review, 2009.
- Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891, 2018.
- Zero-shot recommender systems. arXiv preprint arXiv:2105.08318, 2021.
- Data augmentation for deep graph learning: A survey. ACM SIGKDD Explorations Newsletter, 2022.
- Fakeedge: Alleviate dataset shift in link prediction. In Learning on Graphs Conference, pp. 56–1. PMLR, 2022.
- Graph trend filtering networks for recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022.
- Graph random neural networks for semi-supervised learning on graphs. Advances in neural information processing systems, 33:22092–22103, 2020.
- Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
- Neural message passing for quantum chemistry. In International conference on machine learning. PMLR, 2017.
- Graph-based molecular representation learning. arXiv preprint arXiv:2207.04869, 2022a.
- Linkless link prediction via relational distillation. arXiv preprint arXiv:2210.05801, 2022b.
- Inductive representation learning on large graphs. Advances in neural information processing systems, 2017.
- Mlpinit: Embarrassingly simple gnn training acceleration with mlp initialization. arXiv preprint arXiv:2210.00102, 2022.
- Pre-training graph neural networks for cold-start users and items representation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021.
- Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 2020.
- Tuneup: A training strategy for improving generalization of graph neural networks. arXiv preprint arXiv:2210.14843, 2022.
- Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217, 2019.
- Igb: Addressing the gaps in labeling, features, heterogeneity, and size of public graph datasets for deep learning research, 2023. URL https://arxiv.org/abs/2302.13522.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016a.
- Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016b.
- Network-based prediction of protein interactions. Nature communications, 2019.
- Are message passing neural networks really helpful for knowledge graph completion? ACL, 2023.
- The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 2007.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
- Graph rationalization with environment-based augmentations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1069–1078, 2022a.
- Long-tail session-based recommendation. In Proceedings of the 14th ACM Conference on Recommender Systems, pp. 509–514, 2020.
- Local augmentation for graph neural networks. In International Conference on Machine Learning, pp. 14054–14072. PMLR, 2022b.
- Towards locality-aware meta-learning of tail node embeddings on networks. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 975–984, 2020.
- Tail-gnn: Tail-node graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021.
- On generalized degree fairness in graph neural networks. arXiv preprint arXiv:2302.03881, 2023.
- Meta-learning on heterogeneous information networks for cold-start recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020.
- Automated data augmentations for graph classification. arXiv preprint arXiv:2202.13248, 2022.
- Metropolis-hastings data augmentation for graph neural networks. Advances in Neural Information Processing Systems, 34:19010–19020, 2021.
- Provost, F. Machine learning from imbalanced data sets 101. 2000.
- Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33:4175–4186, 2020.
- Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903, 2019.
- Graph neural networks for friend ranking in large-scale social platforms. In Proceedings of the Web Conference 2021, 2021.
- Modeling relational data with graph convolutional networks. In European semantic web conference, pp. 593–607. Springer, 2018.
- Link prediction with non-contrastive learning. arXiv preprint arXiv:2211.14394, 2022.
- Drug response prediction as a link prediction problem. Scientific reports, 2017.
- Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11662–11671, 2020.
- Long-tailed classification by keeping the good and removing the bad momentum causal effect. Advances in Neural Information Processing Systems, 33:1513–1524, 2020a.
- Investigating and mitigating degree-related biases in graph convoltuional networks. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020b.
- Friend story ranking with edge-contextual local graph convolutions. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 2022.
- Complex embeddings for simple link prediction. In International conference on machine learning. PMLR, 2016.
- Composition-based multi-relational graph convolutional networks. In International Conference on Learning Representations, 2020.
- Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- Dropoutnet: Addressing cold start in recommender systems. Advances in neural information processing systems, 30, 2017.
- Neighborhood attention networks with adversarial learning for link prediction. IEEE Transactions on Neural Networks and Learning Systems, 32(8):3653–3663, 2020.
- Pairwise learning for neural link prediction. arXiv preprint arXiv:2112.02936, 2021.
- Net: Degree-specific graph neural networks for node and graph classification. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 406–415, 2019.
- Handling distribution shifts on graphs: An invariance perspective. arXiv preprint arXiv:2202.02466, 2022.
- A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 2020.
- Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, pp. 5453–5462. PMLR, 2018.
- Link prediction with persistent homology: An interactive view. In International conference on machine learning, pp. 11659–11669. PMLR, 2021.
- Safedrug: Dual molecular graph encoders for recommending effective and safe drug combinations. In IJCAI, pp. 3735–3741, 2021.
- Revisiting semi-supervised learning with graph embeddings. In International conference on machine learning, pp. 40–48. PMLR, 2016.
- Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018.
- Neo-gnns: Neighborhood overlap-aware graph neural networks for link prediction. Advances in Neural Information Processing Systems, 2021.
- Few-shot knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
- Link prediction based on graph neural networks. Advances in neural information processing systems, 2018.
- Labeling trick: A theory of using graph neural networks for multi-node representation learning. Advances in Neural Information Processing Systems, 2021.
- Pairnorm: Tackling oversmoothing in gnns. arXiv preprint arXiv:1909.12223, 2019.
- Data augmentation for graph neural networks. In Proceedings of the aaai conference on artificial intelligence, 2021.
- Graph data augmentation for graph machine learning: A survey. arXiv preprint arXiv:2202.08871, 2022a.
- Learning from counterfactual links for link prediction. In International Conference on Machine Learning. PMLR, 2022b.
- Cold brew: Distilling graph node representations with incomplete or missing neighborhoods. arXiv preprint arXiv:2111.04840, 2021.
- Addressing the item cold-start problem by attribute-driven active learning. IEEE Transactions on Knowledge and Data Engineering, 2019.
- Neural bellman-ford networks: A general graph neural network framework for link prediction. Advances in Neural Information Processing Systems, 2021.