Attribution of performance differences to dataset versus architecture

Ascertain whether observed performance differences among neural network potentials arise primarily from differences in training datasets or from architectural choices, by conducting controlled comparisons that isolate the effects of dataset composition from model design.

Background

Comparing different neural network potential architectures is complicated by the fact that models are typically trained on distinct datasets, making it difficult to attribute performance differences to either data or design. This confounding affects conclusions about which models are best suited for drug discovery applications.

The authors mitigate this by training multiple in-house models on the same ANI-2x datasets to enable fairer comparisons. Nonetheless, establishing general attribution across broader models and datasets remains an unresolved issue.

References

Consequently, it is unclear whether observed differences can be attributed to the training set or the model architecture.

Basic stability tests of machine learning potentials for molecular simulations in computational drug discovery (2503.11537 - Ranasinghe et al., 14 Mar 2025) in Section 1 (Introduction)