Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Molecular Pretraining with Complementary Featurizations (2209.15101v1)

Published 29 Sep 2022 in cs.LG, physics.chem-ph, and q-bio.BM

Abstract: Molecular pretraining, which learns molecular representations over massive unlabeled data, has become a prominent paradigm to solve a variety of tasks in computational chemistry and drug discovery. Recently, prosperous progress has been made in molecular pretraining with different molecular featurizations, including 1D SMILES strings, 2D graphs, and 3D geometries. However, the role of molecular featurizations with their corresponding neural architectures in molecular pretraining remains largely unexamined. In this paper, through two case studies -- chirality classification and aromatic ring counting -- we first demonstrate that different featurization techniques convey chemical information differently. In light of this observation, we propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO). MOCO comprehensively leverages multiple featurizations that complement each other and outperforms existing state-of-the-art models that solely relies on one or two featurizations on a wide range of molecular property prediction tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yanqiao Zhu (45 papers)
  2. Dingshuo Chen (10 papers)
  3. Yuanqi Du (52 papers)
  4. Yingze Wang (8 papers)
  5. Qiang Liu (405 papers)
  6. Shu Wu (109 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com