Dice Question Streamline Icon: https://streamlinehq.com

Demonstrate suitability of OMol25-trained models for underrepresented chemistries

Determine whether machine learning interatomic potentials trained on the Open Molecules 2025 (OMol25) dataset achieve sufficient accuracy and generalization for chemical classes with limited or absent coverage in OMol25, specifically lanthanide complexes, multimetallic structures, solvated protonated organic molecules and metal complexes, polymeric materials, and actinide-containing compounds, by rigorously evaluating their performance on these domains.

Information Square Streamline Icon: https://streamlinehq.com

Background

OMol25 is designed to span major chemistry domains at a high level of theory, but the authors note important gaps in coverage: actinides, polymer-related structures, intermediate-spin metal complexes, and limited representation of lanthanides, multimetallic systems, and solvated protonated species. These areas pose distinct electronic-structure and bonding challenges that may not be fully captured by models trained predominantly on the present dataset.

Although baseline models trained on OMol25 perform well on many tasks, the generalization of these models to the underrepresented or absent domains remains uncertain. Establishing model suitability in these chemistries is critical to ensure practical utility and to guide future dataset expansion and model development.

References

Furthermore, the coverage of certain classes of materials such as lanthanides complexes, multimetallic structures, and solvated protonated organic molecules and metal complexes are relatively limited. Although baseline models trained on OMol25 may still be suitable for these applications, it has yet to be demonstrated.

The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models (2505.08762 - Levine et al., 13 May 2025) in Outlook and future directions