Dice Question Streamline Icon: https://streamlinehq.com

Unified modeling and training across molecular and crystalline scales

Develop a unified modeling and training strategy that enables a single atomic representation model to be pre-trained on mixed data spanning molecules and crystalline materials of different scales and chemical systems, so that the model generalizes across both domains without requiring separate pre-training for organic and inorganic crystals.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper aims to build a unified framework (NMRNet) for predicting NMR chemical shifts across liquid-state and solid-state systems using SE(3) Transformer-based representations. During pre-training, the authors assembled large structural datasets from AFLOW, CSD, and Materials Project, but observed that mixed pre-training on organic and inorganic crystals did not outperform separate pre-training due to substantial differences in chemical environments.

They explicitly note that, despite prior attempts in the literature to mix data from different scales to construct a universal atomic model, achieving a single, effective cross-scale pre-training and training methodology remains unresolved. Resolving this would allow one model to robustly handle both molecular and crystalline domains, improving generality and reducing the need for domain-specific pre-training.

References

Currently, several studies have attempted to mix data from different scales and chemical systems for pre-training, thereby constructing a unified atomic model across scales, however, the problem of unified modeling and training has not yet been fully resolved.