Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Learning of Molecular Embeddings for Enhanced Clustering and Emergent Properties for Chemical Compounds (2310.18367v1)

Published 25 Oct 2023 in physics.chem-ph, cs.AI, cs.CV, and cs.LG

Abstract: The detailed analysis of molecular structures and properties holds great potential for drug development discovery through machine learning. Developing an emergent property in the model to understand molecules would broaden the horizons for development with a new computational tool. We introduce various methods to detect and cluster chemical compounds based on their SMILES data. Our first method, analyzing the graphical structures of chemical compounds using embedding data, employs vector search to meet our threshold value. The results yielded pronounced, concentrated clusters, and the method produced favorable results in querying and understanding the compounds. We also used natural language description embeddings stored in a vector database with GPT3.5, which outperforms the base model. Thus, we introduce a similarity search and clustering algorithm to aid in searching for and interacting with molecules, enhancing efficiency in chemical exploration and enabling future development of emergent properties in molecular property prediction models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jaiveer Gill (1 paper)
  2. Ratul Chakraborty (2 papers)
  3. Reetham Gubba (1 paper)
  4. Amy Liu (3 papers)
  5. Shrey Jain (13 papers)
  6. Chirag Iyer (1 paper)
  7. Obaid Khwaja (1 paper)
  8. Saurav Kumar (4 papers)