Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdaMR: Adaptable Molecular Representation for Unified Pre-training Strategy (2401.06166v2)

Published 28 Dec 2023 in q-bio.BM, cs.AI, and cs.LG

Abstract: We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptability in granularity enriches the model's learning capability at multiple levels and improves its performance in multi-task scenarios. Specifically, the substructure-level molecular representation preserves information about specific atom groups or arrangements, influencing chemical properties and functionalities. This proves advantageous for tasks such as property prediction. Simultaneously, the atomic-level representation, combined with generative molecular canonicalization pre-training tasks, enhances validity, novelty, and uniqueness in generative tasks. All of these features work together to give AdaMR outstanding performance on a range of downstream tasks. We fine-tuned our proposed pre-trained model on six molecular property prediction tasks (MoleculeNet datasets) and two generative tasks (ZINC250K datasets), achieving state-of-the-art (SOTA) results on five out of eight tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yan Ding (41 papers)
  2. Hao Cheng (190 papers)
  3. Ruyi Feng (5 papers)
  4. Zhongze Gu (1 paper)
  5. Ziliang Ye (17 papers)
  6. Wei Tian (82 papers)
  7. Peng Xie (22 papers)
  8. Juan Zhang (94 papers)

Summary

We haven't generated a summary for this paper yet.