Extend MolBert to protein representations and improve its learning strategies
Develop methods to apply MolBert, a BERT-based language model for molecular representation learning, to learn representations of other biological entities, specifically proteins, and design improved pre-training and auxiliary-task learning strategies for MolBert to extend its applicability beyond small-molecule SMILES inputs.
References
We leave to future work the exploration of how to use MolBert for learning representations of other entities such as proteins~\citep{simonovsky, Alley:2019fe, Kim2020deeppcm}, along with further developments in our learning strategies~\citep{pretraingraphs2019pande}.
— Molecular representation learning with language models and domain-relevant auxiliary tasks
(2011.13230 - Fabian et al., 2020) in Conclusions