polyBART: A Chemical Linguist for Polymer Property Prediction and Generative Design (2506.04233v1)
Abstract: Designing polymers for targeted applications and accurately predicting their properties is a key challenge in materials science owing to the vast and complex polymer chemical space. While molecular LLMs have proven effective in solving analogous problems for molecular discovery, similar advancements for polymers are limited. To address this gap, we propose polyBART, a LLM-driven polymer discovery capability that enables rapid and accurate exploration of the polymer design space. Central to our approach is Pseudo-polymer SELFIES (PSELFIES), a novel representation that allows for the transfer of molecular LLMs to the polymer space. polyBART is, to the best of our knowledge, the first LLM capable of bidirectional translation between polymer structures and properties, achieving state-of-the-art results in property prediction and design of novel polymers for electrostatic energy storage. Further, polyBART is validated through a combination of both computational and laboratory experiments. We report what we believe is the first successful synthesis and validation of a polymer designed by a LLM, predicted to exhibit high thermal degradation temperature and confirmed by our laboratory measurements. Our work presents a generalizable strategy for adapting molecular LLMs to the polymer space and introduces a polymer foundation model, advancing generative polymer design that may be adapted for a variety of applications.