Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations (1906.05017v3)

Published 12 Jun 2019 in cs.LG and cs.SI

Abstract: Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks.

Authors (10)

Xiang Yue (72 papers)
Zhen Wang (571 papers)
Jingong Huang (1 paper)
Srinivasan Parthasarathy (76 papers)
Soheil Moosavinasab (4 papers)
Yungui Huang (4 papers)
Simon M. Lin (1 paper)
Wen Zhang (170 papers)
Ping Zhang (437 papers)
Huan Sun (88 papers)

Citations (311)

View on Semantic Scholar

Summary

Graph Embedding on Biomedical Networks: A Comprehensive Evaluation

The paper "Graph embedding on biomedical networks: methods, applications, and evaluations," published in Bioinformatics, undertakes a systematic evaluation of graph embedding techniques applied to biomedical networks. This paper focuses on the performance of various graph embedding methods on crucial biomedical tasks, namely drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, and protein-protein interaction (PPI) prediction, as well as node classification tasks, including medical term semantic type classification and protein function prediction.

Overview

Graph embedding techniques have gained popularity due to their ability to automatically learn low-dimensional representations of nodes while preserving the structural information of the original graph. The main motivation for this paper is the observation that most graph embedding methods have been predominantly evaluated on social and information networks, rather than biomedical networks. This paper responds to that gap by assessing the potential of these methods in advancing the state-of-the-art in biomedical graph analysis.

Methods and Evaluation

The authors selected 11 representative graph embedding methods for evaluation, spanning matrix factorization-based, random walk-based, and neural network-based categories. Each method was systematically assessed across three link prediction tasks (DDA, DDI, PPI) and two node classification problems (semantic type classification, protein function prediction). The experiments utilized seven benchmark datasets compiled from existing biomedical databases to ensure robust evaluation conditions.

The evaluation metrics employed were comprehensive, including the area under the ROC curve (AUC), accuracy, macro-F1, and micro-F1 scores, providing a granular and quantitative analysis of method performance across various tasks. Furthermore, the authors furnished detailed insights into hyper-parameter settings, enhancing the paper's practical utility.

Key Findings

Performance of Graph Embedding Methods: The results demonstrated that recent graph embedding methods achieved competitive performance compared to traditional techniques like Laplacian eigenmaps and singular value decomposition. Notably, methods such as LINE and struc2vec exhibited robust performance across multiple datasets without reliance on biological features, positioning them as complementary techniques for improving biomedical task outcomes.
Comparative Analysis: Compared to state-of-the-art methods specifically designed for DDAs (e.g., LRSSL) and DDIs (e.g., DeepDDI), the selected graph embedding methods displayed competitive performance, further reinforced by the incorporation of learned embeddings into existing methods to enhance predictive accuracy.
Guidelines for Practitioners: By summarizing the experimental outcomes, the authors provided several guidelines for selecting appropriate graph embedding methods and tuning hyper-parameters, tailored to specific biomedical tasks. This guidance is poised to serve researchers contemplating the integration of graph embeddings into their analytical workflows.

Implications and Future Directions

The empirical insights from this paper have substantial implications for the application of graph embedding in biomedical informatics. The performance of these methods suggests their potential utility in facilitating drug discovery, understanding molecular interactions, and elucidating semantic medical relationships. Future research directions proposed by the authors include leveraging network propagation techniques, integrating biological features into embedding processes, and exploring transfer learning approaches to further bolster the accuracy and applicability of graph embeddings in biomedical contexts.

This paper represents a methodical approach to bridging computational techniques with critical biomedical needs, underscoring the continued importance of interdisciplinary research in advancing healthcare solutions through data-driven methods. Researchers in the computational biology and biomedical informatics domains could accrue significant benefits by incorporating the findings and methodologies detailed in this paper when addressing complex biomedical questions.

PDF Markdown