Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GPT-MolBERTa: GPT Molecular Features Language Model for molecular property prediction (2310.03030v3)

Published 20 Sep 2023 in physics.chem-ph and cs.LG

Abstract: With the emergence of Transformer architectures and their powerful understanding of textual data, a new horizon has opened up to predict the molecular properties based on text description. While SMILES are the most common form of representation, they are lacking robustness, rich information and canonicity, which limit their effectiveness in becoming generalizable representations. Here, we present GPT-MolBERTa, a self-supervised LLM which uses detailed textual descriptions of molecules to predict their properties. A text based description of 326000 molecules were collected using ChatGPT and used to train LLM to learn the representation of molecules. To predict the properties for the downstream tasks, both BERT and RoBERTa models were used in the finetuning stage. Experiments show that GPT-MolBERTa performs well on various molecule property benchmarks, and approaching state of the art performance in regression tasks. Additionally, further analysis of the attention mechanisms show that GPT-MolBERTa is able to pick up important information from the input textual data, displaying the interpretability of the model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Suryanarayanan Balaji (1 paper)
  2. Rishikesh Magar (13 papers)
  3. Yayati Jadhav (5 papers)
  4. Amir Barati Farimani (121 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.