IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins (IDP) Using Large Language Models (2403.19762v2)
Abstract: Intrinsically Disordered Proteins (IDPs) constitute a large and structure-less class of proteins with significant functions. The existence of IDPs challenges the conventional notion that the biological functions of proteins rely on their three-dimensional structures. Despite lacking well-defined spatial arrangements, they exhibit diverse biological functions, influencing cellular processes and shedding light on the disease mechanisms. However, it is expensive to run experiments or simulations to characterize this class of proteins. Consequently, we designed an ML model that relies solely on amino acid sequences. In this study, we introduce IDP-Bert model, a deep-learning architecture leveraging Transformers and Protein LLMs (PLMs) to map sequences directly to IDPs properties. Our experiments demonstrate accurate predictions of IDPs properties, including Radius of Gyration, end-to-end Decorrelation Time, and Heat Capacity.
- Uversky, V. N. Intrinsically disordered proteins and their “mysterious”(meta) physics. Frontiers in Physics 2019, 7, 10
- Joshi, P.; Vendruscolo, M. Druggability of intrinsically disordered proteins. Intrinsically disordered proteins studied by NMR spectroscopy 2015, 383–400
- Li, Z.; Meidani, K.; Yadav, P.; Barati Farimani, A. Graph neural networks accelerated molecular dynamics. The Journal of Chemical Physics 2022, 156
- Kim, S.; Mollaei, P.; Antony, A.; Magar, R.; Barati Farimani, A. GPCR-BERT: Interpreting Sequential Design of G Protein-Coupled Receptors Using Protein Language Models. Journal of Chemical Information and Modeling 2024,
- Liao, Y.-L.; Wood, B.; Das, A.; Smidt, T. Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations. arXiv preprint arXiv:2306.12059 2023,
- Wang, Y.; Li, Z.; Barati Farimani, A. Machine Learning in Molecular Sciences; Springer, 2023; pp 21–66
- Goldberg, Y. Neural network methods for natural language processing; Springer Nature, 2022
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; others Transformers: State-of-the-art natural language processing. Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations. 2020; pp 38–45
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019
- Ramadoss, R.; Kumar, J. K.; others Pre-Trained Language Models Based Sequence Prediction of Wnt-Sclerostin Protein Sequences in Alveolar Bone Formation. Journal of Pioneering Medical Sciences 2023, 12
- Patil, S.; Mollaei, P.; Farimani, A. B. Forecasting COVID-19 New Cases Using Transformer Deep Learning Model. medRxiv 2023, 2023–11
- Trément, S.; Schnell, B.; Petitjean, L.; Couty, M.; Rousseau, B. Conservative and dissipative force field for simulation of coarse-grained alkane molecules: A bottom-up approach. The Journal of chemical physics 2014, 140
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 2018,
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 2017,
- PyTorch: ReduceLROnPlateau — PyTorch 1.9.0 documentation. https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html#torch.optim.lr_scheduler.ReduceLROnPlateau, Accessed January 2024
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. Journal of machine learning research 2008, 9