Papers
Topics
Authors
Recent
2000 character limit reached

Graph-level Protein Representation Learning by Structure Knowledge Refinement (2401.02713v1)

Published 5 Jan 2024 in cs.LG, cs.AI, and q-bio.BM

Abstract: This paper focuses on learning representation on the whole graph level in an unsupervised manner. Learning graph-level representation plays an important role in a variety of real-world issues such as molecule property prediction, protein structure feature extraction, and social network analysis. The mainstream method is utilizing contrastive learning to facilitate graph feature extraction, known as Graph Contrastive Learning (GCL). GCL, although effective, suffers from some complications in contrastive learning, such as the effect of false negative pairs. Moreover, augmentation strategies in GCL are weakly adaptive to diverse graph datasets. Motivated by these problems, we propose a novel framework called Structure Knowledge Refinement (SKR) which uses data structure to determine the probability of whether a pair is positive or negative. Meanwhile, we propose an augmentation strategy that naturally preserves the semantic meaning of the original data and is compatible with our SKR framework. Furthermore, we illustrate the effectiveness of our SKR framework through intuition and experiments. The experimental results on the tasks of graph-level classification demonstrate that our SKR framework is superior to most state-of-the-art baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Sub2vec: Feature learning for subgraphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 170–182. Springer, 2018.
  2. Shortest-path kernels on graphs. In Fifth IEEE international conference on data mining (ICDM’05), pages 8–pp. IEEE, 2005.
  3. Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic acids research, 31(9):2443–2450, 2003.
  4. Debiased contrastive learning. arXiv preprint arXiv:2007.00224, 2020.
  5. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864, 2016.
  6. Graphcl: Contrastive self-supervised learning of graph representations. arXiv preprint arXiv:2007.08025, 2020.
  7. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  8. Graphs in molecular biology. BMC bioinformatics, 8(6):1–14, 2007.
  9. Boosting contrastive self-supervised learning with false negative cancellation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2785–2795, 2022.
  10. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  11. The multiscale laplacian graph kernel. Advances in neural information processing systems, 29:2990–2998, 2016.
  12. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR, 2014.
  13. Pasi Luukka. Feature selection using fuzzy entropy measures with similarity classifier. Expert Systems with Applications, 38(4):4600–4607, 2011.
  14. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  15. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
  16. Learning word embeddings efficiently with noise-contrastive estimation. In Advances in neural information processing systems, pages 2265–2273, 2013.
  17. Tudataset: A collection of benchmark datasets for learning with graphs. arXiv preprint arXiv:2007.08663, 2020.
  18. graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005, 2017.
  19. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.
  20. Nataša Pržulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 23(2):e177–e183, 2007.
  21. Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592, 2020.
  22. Efficient graphlet kernels for large graph comparison. In Artificial intelligence and statistics, pages 488–495. PMLR, 2009.
  23. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(9), 2011.
  24. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:1908.01000, 2019.
  25. Adversarial graph augmentation to improve graph contrastive learning. arXiv preprint arXiv:2106.05819, 2021.
  26. Self-supervised learning on graphs: Contrastive, generative, or predictive. IEEE Transactions on Knowledge and Data Engineering, 2021.
  27. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
  28. Deep graph kernels. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 1365–1374, 2015.
  29. Graph contrastive learning with augmentations. Advances in Neural Information Processing Systems, 33:5812–5823, 2020.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.