Multi-label Node Classification On Graph-Structured Data (2304.10398v4)
Abstract: Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected $9$ multi-label datasets. Finally, we perform a large-scale comparative study with $8$ methods and $9$ datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.
- Collaborative graph walk for semi-supervised multi-label node classification. CoRR, abs/1910.09706, 2019. URL http://arxiv.org/abs/1910.09706.
- Mesh: a window into full text for document summarization. Bioinformatics, 27(13):i120–i128, 2011.
- Models of social networks based on social distance attachment. Physical review E, 70(5):056122, 2004.
- Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5177–5186, 2019.
- Multi-label classification of fundus images based on graph convolutional network. BMC Medical Informatics and Decision Making, 21(2):1–9, 2021.
- Euk-mploc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. Journal of Proteome Research, 6(5):1728–1734, 2007.
- Deep generative models for weakly-supervised multi-label classification. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 400–415, 2018.
- Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. 47(D1):D330–D338, 2018.
- UniProt Consortium. Uniprot: a hub for protein information. Nucleic acids research, 43(D1):D204–D212, 2015.
- String v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019.
- Towards a consistent evaluation of mirna-disease association prediction models. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1835–1842. IEEE, 2020.
- Inductive representation learning on large graphs. CoRR, abs/1706.02216, 2017. URL http://arxiv.org/abs/1706.02216.
- Weakly supervised image classification through noise regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11517–11525, 2019.
- Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687, 2020.
- Interactive multi-label cnn learning with partial labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9423–9432, 2020.
- Scalable generative models for multi-label learning with missing labels. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1636–1644. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/jain17a.html.
- Residual correlation in graph neural network regression. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 588–598, 2020.
- The intact molecular interaction database in 2012. Nucleic acids research, 40(D1):D841–D846, 2012.
- Node representation learning for directed graphs. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 395–411. Springer, 2019.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Neural message passing for multi-label classification. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 138–163. Springer, 2019.
- Heterogeneous graph neural networks for multi-label text classification. CoRR, abs/2103.14620, 2021. URL https://arxiv.org/abs/2103.14620.
- Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pp. 115–124, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450350228. doi: 10.1145/3077136.3080834. URL https://doi.org/10.1145/3077136.3080834.
- The emerging trends of multi-label learning. IEEE transactions on pattern analysis and machine intelligence, 44(11):7955–7974, 2021.
- Copulagnn: towards integrating representational and correlational roles of graphs in graph neural networks. arXiv preprint arXiv:2010.02089, 2020.
- Label-specific dual graph neural network for multi-label text classification. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 3855–3864, 2021a.
- Is homophily a necessity for graph neural networks? arXiv preprint arXiv:2106.06134, 2021b.
- Asymmetric transitivity preserving graph embedding. In Proc. of the International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114, 2016.
- Multi-label text classification using attention-based graph neural network. arXiv preprint arXiv:2003.11644, 2020.
- Deepwalk: Online learning of social representations. In Proc. of the International Conference on Knowledge Discovery and Data Mining, 2014.
- The disgenet knowledge platform for disease genomics: 2019 update. Nucleic acids research, 48(D1):D845–D855, 2020.
- Galaxc: Graph neural networks with labelwise attention for extreme classification. WWW ’21, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383127. doi: 10.1145/3442381.3449937. URL https://doi.org/10.1145/3442381.3449937.
- Hum-mploc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochemical and biophysical research communications, 355(4):1006–1011, 2007.
- Multi-label graph convolutional network representation learning. CoRR, abs/1912.11757, 2019. URL http://arxiv.org/abs/1912.11757.
- Multi-label graph convolutional network representation learning. IEEE Transactions on Big Data, 2020.
- Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training. Inf. Syst., 106(C), may 2022. ISSN 0306-4379. doi: 10.1016/j.is.2021.101718. URL https://doi.org/10.1016/j.is.2021.101718.
- Semi-supervised multi-label learning for graph-structured data. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 1723–1733, 2021.
- Homophily as a process generating social networks: Insights from social distance attachment model. Journal of Artificial Societies and Social Simulation, 23(2):6, 2020. ISSN 1460-7425.
- Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pp. 817–826, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605584959. doi: 10.1145/1557019.1557109. URL https://doi.org/10.1145/1557019.1557109.
- A framework to generate synthetic multi-label datasets. Electronic Notes in Theoretical Computer Science, 302:155–176, 2014.
- Graph Attention Networks. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ. accepted as poster.
- Ranking-based autoencoder for extreme multi-label classification. CoRR, abs/1904.05937, 2019. URL http://arxiv.org/abs/1904.05937.
- Unifying graph convolutional neural networks and label propagation. CoRR, abs/2002.06755, 2020. URL https://arxiv.org/abs/2002.06755.
- A human functional protein interaction network and its application to cancer data analysis. Genome biology, 11(5):1–23, 2010.
- Hierarchical multi-label text classification with horizontal and vertical category correlations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 2459–2468, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.190. URL https://aclanthology.org/2021.emnlp-main.190.
- Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. In Proceedings of The Web Conference 2021 (WWW ’21). ACM, 2021.
- Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method. Computational and structural biotechnology journal, 18:153–161, 2020.
- Evaluating link prediction methods. Knowledge and Information Systems, 45(3):751–782, 2015.
- Graphsaint: Graph sampling based inductive learning method. CoRR, abs/1907.04931, 2019. URL http://arxiv.org/abs/1907.04931.
- A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering, 26(8):1819–1837, 2013.
- Capsule graph neural network for multi-label image recognition (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 13117–13118, 2022.
- Multi-label graph node classification with label attentive neighborhood convolution. Expert Systems with Applications, 180:115063, 2021. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa.2021.115063. URL https://www.sciencedirect.com/science/article/pii/S0957417421005042.
- Beyond homophily in graph neural networks: Current limitations and effective designs. Advances in Neural Information Processing Systems, 33, 2020.
- GNN-XML: graph neural networks for extreme multi-label text classification. CoRR, abs/2012.05860, 2020. URL https://arxiv.org/abs/2012.05860.