Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations (2105.12638v2)

Published 26 May 2021 in cond-mat.mtrl-sci and cs.LG

Abstract: Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goal of this study is to develop a general model capable of predicting the solubility of a broad range of organic molecules. Using the largest currently available solubility dataset, we implement deep learning-based models to predict solubility from molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system (SMILES) strings, molecular graphs, and three-dimensional (3D) atomic coordinates using four different neural network architectures - fully connected neural networks (FCNNs), recurrent neural networks (RNNs), graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Gihan Panapitiya (4 papers)
  2. Michael Girard (5 papers)
  3. Aaron Hollas (2 papers)
  4. Vijay Murugesan (1 paper)
  5. Wei Wang (1793 papers)
  6. Emily Saldanha (5 papers)
Citations (41)

Summary

We haven't generated a summary for this paper yet.