Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Molecular Contrastive Learning with Chemical Element Knowledge Graph (2112.00544v2)

Published 1 Dec 2021 in cs.LG, cs.AI, and q-bio.QM

Abstract: Molecular representation learning contributes to multiple downstream tasks such as molecular property prediction and drug design. To properly represent molecules, graph contrastive learning is a promising paradigm as it utilizes self-supervision signals and has no requirements for human annotations. However, prior works fail to incorporate fundamental domain knowledge into graph semantics and thus ignore the correlations between atoms that have common attributes but are not directly connected by bonds. To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning. KCL framework consists of three modules. The first module, knowledge-guided graph augmentation, augments the original molecular graph based on the Chemical Element KG. The second module, knowledge-aware graph representation, extracts molecular representations with a common graph encoder for the original molecular graph and a Knowledge-aware Message Passing Neural Network (KMPNN) to encode complex information in the augmented molecular graph. The final module is a contrastive objective, where we maximize agreement between these two views of molecular graphs. Extensive experiments demonstrated that KCL obtained superior performances against state-of-the-art baselines on eight molecular datasets. Visualization experiments properly interpret what KCL has learned from atoms and attributes in the augmented molecular graphs. Our codes and data are available at https://github.com/ZJU-Fangyin/KCL.

Molecular Contrastive Learning with Chemical Element Knowledge Graph

The paper introduces a novel approach to molecular representation learning utilizing graph contrastive learning enhanced with domain knowledge from chemical elements. The authors construct a Chemical Element Knowledge Graph (KG) to capture associations between elements and propose a Knowledge-enhanced Contrastive Learning (KCL) framework. This framework integrates the KG to enrich molecular graphs with relevant domain knowledge. The approach consists of three key modules: knowledge-guided graph augmentation, knowledge-aware graph representation, and a contrastive objective. Empirical results on several datasets demonstrate that KCL outperforms state-of-the-art methods in molecular property prediction tasks.

Methodology

  1. Chemical Element Knowledge Graph (KG) Construction: The authors build a Chemical Element KG from the Periodic Table of Elements. This KG encodes relations between elements and their chemical attributes, enabling the modeling of associations beyond direct chemical bonds.
  2. Knowledge-guided Graph Augmentation: Molecular graphs are augmented using the KG. This process preserves original graph structures while incorporating domain knowledge by linking attributes to elements, thus capturing subtle relationships, such as common attributes shared by non-bonded atoms.
  3. Knowledge-aware Graph Representation: The authors develop a Knowledge-aware Message Passing Neural Network (KMPNN) to process the augmented graphs. KMPNN provides heterogeneous message passing for different knowledge types, using attentive mechanisms to weigh information from different nodes.
  4. Contrastive Objective: The KCL framework employs a contrastive learning approach aimed at maximizing agreement between molecular representations from the original and augmented graph views while discriminating against hard negatives.

Experimental Evaluation

The proposed KCL framework was evaluated against several baseline models across eight molecular datasets, with tasks including classification and regression. Results show that KCL achieves superior performance, particularly in small datasets where labeled data is sparse. The framework's ability to incorporate domain knowledge allows it to perform well even on challenging datasets. Contrastive learning with hard negative mining significantly enhances representation quality, contributing to improved downstream task performance.

Implications and Future Directions

KCL's integration of chemical domain knowledge into machine learning models represents a step forward in molecular representation learning. By modeling the subtle relationships captured in the Chemical Element KG, KCL provides enhanced interpretability and potentially more insightful molecular predictions. This approach has practical implications for drug discovery and materials science, where understanding molecular properties is crucial.

Future directions could explore incorporating additional layers of domain knowledge, such as more complex chemical interactions or even incorporating environmental data, to further improve model accuracy and applicability. Moreover, extending this framework to analyze larger and more diverse chemical databases could provide further insights, potentially transforming practices in computational chemistry and related fields.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yin Fang (32 papers)
  2. Qiang Zhang (466 papers)
  3. Haihong Yang (3 papers)
  4. Xiang Zhuang (10 papers)
  5. Shumin Deng (65 papers)
  6. Wen Zhang (170 papers)
  7. Ming Qin (9 papers)
  8. Zhuo Chen (319 papers)
  9. Xiaohui Fan (341 papers)
  10. Huajun Chen (198 papers)
Citations (97)