Scaling PIML to large biomolecular assemblies

Develop scalable physics-informed machine learning methodologies that can model biomolecular assemblies comprising thousands to millions of atoms without prohibitive growth in collocation points, training data, or computational cost, while retaining physical admissibility under frameworks such as Physics-Informed Neural Networks and operator-learning approaches.

Background

The paper highlights persistent challenges in applying physics-informed machine learning (PIML) to truly high-dimensional biomolecular systems. Even after coarse-graining, effective dimensionality remains high, causing PINNs to require large numbers of collocation points and operator-learning models to need exponentially more training data. Gradient-based training can become unstable due to concentration-of-measure effects.

Existing workarounds—collective-variable reduction, hierarchical architectures, and exploiting sparsity/locality—do not fully resolve the scalability barrier. The authors explicitly identify the need to scale PIML methods to assemblies with thousands to millions of atoms as an unresolved problem.

References

Despite this, the open problem of scaling PIML to large biomolecular assemblies with thousands to millions of atoms remains largely unsolved.

— Learning Biomolecular Motion: The Physics-Informed Machine Learning Paradigm (2511.06585 - Deshpande, 10 Nov 2025) in Section 6.2, Technical Limitations—The Curse of Dimensionality Persists

Scaling PIML to large biomolecular assemblies

Background

References

Related Problems