Tokenization for molecular foundation models
Determine a principled tokenization scheme for molecular foundation models that represents continuous three-dimensional molecular configurations as discrete tokens, evaluating alternatives such as voxelization, graph encoding, and point-cloud representations.
References
Key open questions: Tokenization: How to embed continuous molecular configurations into discrete tokens? Voxelization? Graph encoding? Point clouds?
— Learning Biomolecular Motion: The Physics-Informed Machine Learning Paradigm
(2511.06585 - Deshpande, 10 Nov 2025) in Section 7, Future Directions—Physics-Grounded Foundation Models