Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learn molecular representations from large-scale unlabeled molecules for drug discovery (2012.11175v1)

Published 21 Dec 2020 in cs.LG, q-bio.BM, and q-bio.QM

Abstract: How to produce expressive molecular representations is a fundamental challenge in AI-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and have poor generalization capability. Here, we proposed a novel Molecular Pre-training Graph-based deep learning framework, named MPG, that leans molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful MolGNet model and an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemistry insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction, and drug-target interaction, involving 13 benchmark datasets. Our work demonstrates that MPG is promising to become a novel approach in the drug discovery pipeline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Pengyong Li (5 papers)
  2. Jun Wang (991 papers)
  3. Yixuan Qiao (10 papers)
  4. Hao Chen (1006 papers)
  5. Yihuan Yu (1 paper)
  6. Xiaojun Yao (56 papers)
  7. Peng Gao (402 papers)
  8. Guotong Xie (31 papers)
  9. Sen Song (25 papers)
Citations (27)

Summary

We haven't generated a summary for this paper yet.