GraphSR: A Data Augmentation Algorithm for Imbalanced Node Classification (2302.12814v2)
Abstract: Graph neural networks (GNNs) have achieved great success in node classification tasks. However, existing GNNs naturally bias towards the majority classes with more labelled data and ignore those minority classes with relatively few labelled ones. The traditional techniques often resort over-sampling methods, but they may cause overfitting problem. More recently, some works propose to synthesize additional nodes for minority classes from the labelled nodes, however, there is no any guarantee if those generated nodes really stand for the corresponding minority classes. In fact, improperly synthesized nodes may result in insufficient generalization of the algorithm. To resolve the problem, in this paper we seek to automatically augment the minority classes from the massive unlabelled nodes of the graph. Specifically, we propose \textit{GraphSR}, a novel self-training strategy to augment the minority classes with significant diversity of unlabelled nodes, which is based on a Similarity-based selection module and a Reinforcement Learning(RL) selection module. The first module finds a subset of unlabelled nodes which are most similar to those labelled minority nodes, and the second one further determines the representative and reliable nodes from the subset via RL technique. Furthermore, the RL-based module can adaptively determine the sampling scale according to current training data. This strategy is general and can be easily combined with different GNNs models. Our experiments demonstrate the proposed approach outperforms the state-of-the-art baselines on various class-imbalanced datasets.
- A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1): 20–29.
- Spectral clustering with graph neural networks for graph pooling. In International Conference on Machine Learning, 874–883. PMLR.
- Butler, J. W. 1956. Machine sampling from given probability distributions. In Symposium on Monte Carlo Methods, 249–264. Wiley New York.
- Learning imbalanced datasets with label-distribution-aware margin loss. In Advances in Neural Information Processing Systems, 1567–1578.
- SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16: 321–357.
- Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5177–5186.
- Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9268–9277.
- Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29.
- Neural message passing for quantum chemistry. In International conference on machine learning, 1263–1272. PMLR.
- A new model for learning in graph domains. In Proceedings. 2005 IEEE international joint conference on neural networks, 729–734.
- Inductive representation learning on large graphs. Advances in neural information processing systems, 30.
- Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9): 1263–1284.
- The class imbalance problem: A systematic study. Intelligent data analysis, 6(5): 429–449.
- Approximately optimal approximate reinforcement learning. In In Proc. 19th International Conference on Machine Learning. Citeseer.
- M2m: Imbalanced classification via major-to-minor translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13896–13905.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
- Addressing the curse of imbalanced training sets: one-sided selection. In Icml, 179. Citeseer.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, 2980–2988.
- Cost-sensitive learning and the class imbalance problem. Encyclopedia of machine learning, 2011: 231–235.
- Fair loss: Margin-aware reinforcement learning for deep face recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10052–10061.
- Pick and choose: a GNN-based imbalanced learning approach for fraud detection. In Proceedings of the Web Conference 2021, 3168–3177.
- MESA: boost ensemble imbalanced learning with meta-sampler. Advances in Neural Information Processing Systems, 33: 14463–14474.
- PULNS: Positive-Unlabeled Learning with Effective Negative Sample Selector. In Proceedings of the AAAI Conference on Artificial Intelligence, 8784–8792.
- Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314.
- GraphENS: Neighbor-Aware Ego Network Synthesis for Class-Imbalanced Node Classification. In International Conference on Learning Representations.
- Imgagn: Imbalanced network embedding via generative adversarial graph networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 1390–1398.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Scudder, H. 1965. Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory, 11(3): 363–371.
- Collective classification in network data. AI magazine, 29(3): 93–93.
- Multi-class imbalanced graph convolutional network learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20).
- Graph attention networks. arXiv preprint arXiv:1710.10903.
- Rsg: A simple but effective module for learning imbalanced datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3784–3793.
- Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10857–10866.
- Session-based recommendation with graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, 346–353.
- Graph convolutional networks for text classification. In Proceedings of the AAAI conference on artificial intelligence, 7370–7377.
- Position-aware graph neural networks. In International conference on machine learning, 7134–7143. PMLR.
- Graphsmote: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the 14th ACM international conference on web search and data mining, 833–841.