Understanding the language of molecules: Predicting pure component parameters for the PC-SAFT equation of state from SMILES (2309.12404v1)
Abstract: A major bottleneck in developing sustainable processes and materials is a lack of property data. Recently, machine learning approaches have vastly improved previous methods for predicting molecular properties. However, these machine learning models are often not able to handle thermodynamic constraints adequately. In this work, we present a machine learning model based on natural language processing to predict pure-component parameters for the perturbed-chain statistical associating fluid theory (PC-SAFT) equation of state. The model is based on our previously proposed SMILES-to-Properties-Transformer (SPT). By incorporating PC-SAFT into the neural network architecture, the machine learning model is trained directly on experimental vapor pressure and liquid density data. Combining established physical modeling approaches with state-of-the-art machine learning methods enables high-accuracy predictions across a wide range of pressures and temperatures, while maintaining the physical meaning of PC-SAFT parameters. SPT-PCSAFT demonstrates exceptional prediction accuracy even for complex molecules with various functional groups, outperforming traditional group contribution methods by a factor of four in the mean average percentage deviation. Moreover, SPT-PCSAFT captures the behavior of stereoisomers without any special consideration. To facilitate the application of our model, we provide predicted PC-SAFT parameters of more than 13645 components, making PC-SAFT accessible to all researchers.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.