Assess performance of open-source chemistry LLMs on negative SMARTS constraints
Determine the ability of the open-source chemistry language models BioT5 and MoleculeSTM, when used within the MolLEO framework, to satisfy negative matching constraints in molecular generation tasks that require excluding specified SMARTS-defined scaffolds or substructures (e.g., generating molecules that do not contain the scaffold [#7]-c1n[c;h1]nc2[c;h1]c(-[#8])[c;h0][c;h1]c12), and quantify their performance on tasks such as deco_hop and scaffold_hop.
References
Also, it is unclear how well these models perform with negative matching (e.g., This molecule does not contain the scaffold [#7]-c1n[c;h1]nc2 [c;h1]c(-[#8])[c;h0][c;h1]c12).
— Efficient Evolutionary Search Over Chemical Space with Large Language Models
(2406.16976 - Wang et al., 23 Jun 2024) in Section 4.2 Empirical Study (Single-objective results), paragraph discussing deco_hop and scaffold_hop (following Table 1)