Drug Resistance Predictions Based on a Directed Flag Transformer (2403.02603v2)
Abstract: The continuous evolution of the SARS-CoV-2 virus poses a significant challenge to global public health. Of particular concern is the potential resistance to the widely prescribed drug PAXLOVID, of which the main ingredient nirmatrelvir inhibits the viral main protease (Mpro). Here, we developed CAPTURE (direCted flAg laPlacian Transformer for drUg Resistance prEdictions) to analyze the effects of Mpro mutations on nirmatrelvir-Mpro binding affinities and identify potential drug-resistant mutations. CAPTURE combines a comprehensive mutation analysis with a resistance prediction module based on DFFormer-seq, which is a novel ensemble model that leverages a new Directed Flag Transformer and sequence embeddings from the protein and small-molecule-large-LLMs. Our analysis of the evolution of Mpro mutations revealed a progressive increase in mutation frequencies for residues near the binding site between May and December 2022, suggesting that the widespread use of PAXLOVID created a selective pressure that accelerated the evolution of drug-resistant variants. Applied to mutations at the nirmatrelvir-Mpro binding site, CAPTURE identified several potential resistance mutations, including H172Y and F140L, which have been experimentally confirmed, as well as five other mutations that await experimental verification. CAPTURE evaluation in a limited experimental data set on Mpro mutants gives a recall of 57\% and a precision of 71\% for predicting potential drug-resistant mutations. Our work establishes a powerful new framework for predicting drug-resistant mutations and real-time viral surveillance. The insights also guide the rational design of more resilient next-generation therapeutics.
- Sars-cov-2 3clpro mutations selected in a vsv-based system confer resistance to nirmatrelvir, ensitrelvir, and gc376. Science Translational Medicine, 15(678):eabq7360, 2022.
- Covid-19 antivirals utilization: Geographic and demographic patterns of treatment in 2022. 2022.
- Integrative approach to dissect the drug resistance mechanism of the h172y mutation of sars-cov-2 main protease. Journal of Chemical Information and Modeling, 2023.
- Perspectives on sars-cov-2 main protease inhibitors. Journal of medicinal chemistry, 64(23):16922–16955, 2021.
- Protein-ligand interactions. John Wiley & Sons, 2012.
- mcsm-lig: quantifying the effects of mutations on protein-small molecule affinity in genetic disease and emergence of drug resistance. Scientific reports, 6(1):29575, 2016.
- Naturally occurring mutations of sars-cov-2 main protease confer drug resistance to nirmatrelvir. ACS Central Science, 9(8):1658–1669, 2023.
- Data, disease and diplomacy: Gisaid’s innovative contribution to global health. Global challenges, 1(1):33–46, 2017.
- Chapter 13 - population genetics. In David Rimoin, Reed Pyeritz, and Bruce Korf, editors, Emery and Rimoin’s Principles and Practice of Medical Genetics (Sixth Edition), pages 1–12. Academic Press, Oxford, sixth edition edition, 2013. ISBN 978-0-12-383834-6. doi: https://doi.org/10.1016/B978-0-12-383834-6.00015-X. URL https://www.sciencedirect.com/science/article/pii/B978012383834600015X.
- Deep mutational scanning: a new style of protein science. Nature methods, 11(8):801–807, 2014.
- Bindingdb: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic acids research, 35(suppl_1):D198–D201, 2007.
- Protein data bank (pdb): the single global macromolecular structure archive. Protein crystallography: methods and protocols, pages 627–641, 2017.
- The pdbbind database: methodologies and updates. Journal of medicinal chemistry, 48(12):4111–4119, 2005.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Algebraic graph-assisted bidirectional transformers for molecular property prediction. Nature communications, 12(1):3521, 2021a.
- An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Briefings in Bioinformatics, 22(6):bbab109, 2021.
- Persistent path laplacian. Foundations of data science (Springfield, Mo.), 5(1):26, 2023.
- Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS computational biology, 14(1):e1005929, 2018.
- Crystal structure of sars-cov-2 main protease in complex with protease inhibitor pf-07321332. Protein & cell, 13(9):689–693, 2022.
- Changchuan Yin. Genotyping coronavirus SARS-CoV-2: methods and implications. Genomics, 112(5):3588–3596, 2020.
- Decoding sars-cov-2 transmission and evolution and ramifications for covid-19 diagnosis, vaccine, and medicine. Journal of chemical information and modeling, 60(12):5853–5865, 2020a.
- Pdb-wide collection of binding data: current status of the pdbbind database. Bioinformatics, 31(3):405–412, 2015.
- Comparative assessment of scoring functions on a diverse test set. Journal of chemical information and modeling, 49(4):1079–1093, 2009.
- Comparative assessment of scoring functions on an updated benchmark: 2. evaluation methods and general results. Journal of chemical information and modeling, 54(6):1717–1736, 2014.
- Comparative assessment of scoring functions: the casf-2016 update. Journal of chemical information and modeling, 59(2):895–913, 2018.
- Using experience sampling methods/ecological momentary assessment (esm/ema) in clinical assessment and clinical research: introduction to the special section. 2009.
- Extracting predictive representations from hundreds of millions of molecules. The journal of physical chemistry letters, 12(44):10793–10801, 2021b.
- Development and evaluation of a deep learning model for protein–ligand binding affinity prediction. Bioinformatics, 34(21):3666–3674, 2018.
- Learning from the ligand: using ligand-based features to improve binding affinity prediction. Bioinformatics, 36(3):758–764, 2020.
- graphdelta: Mpnn scoring function for the affinity prediction of protein–ligand complexes. ACS omega, 5(10):5150–5159, 2020.
- Extended connectivity interaction features: improving binding affinity prediction through chemical description. Bioinformatics, 37(10):1376–1382, 2021.
- Onionnet-2: a convolutional neural network model for predicting protein-ligand binding affinity based on residue-atom contacting shells. Frontiers in chemistry, 9:753002, 2021a.
- Deep learning in drug design: protein-ligand binding affinity prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(1):407–417, 2020.
- Se-onionnet: a convolution neural network for protein–ligand binding affinity prediction. Frontiers in Genetics, 11:607824, 2021b.
- Improved protein–ligand binding affinity prediction with structure-based deep fusion inference. Journal of chemical information and modeling, 61(4):1583–1592, 2021.
- Unveiling the molecular mechanism of sars-cov-2 main protease inhibition from 137 crystal structures using algebraic topology and deep learning. Chemical science, 11(44):12036–12046, 2020.
- Graph energy. Analysis of Complex Networks: From Biology to Linguistics, pages 145–174, 2009.
- Jason Z Xiang and B Honig. Jackal: A protein structure modeling package. Columbia University and Howard Hughes Medical Institute, New York, 2002.
- Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments. Journal of computer-aided molecular design, 27:221–234, 2013.
- A new coronavirus associated with human respiratory disease in China. Nature, 579(7798):265–269, 2020.
- GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance, 22(13):30494, 2017.
- Clustal omega. Current protocols in bioinformatics, 48(1):3–13, 2014.
- Path topology in molecular and materials sciences. The Journal of Physical Chemistry Letters, 14(4):954–964, 2023a.
- Path complexes and their homologies. Journal of Mathematical Sciences, 248:564–599, 2020.
- Persistent hyperdigraph homology and persistent hyperdigraph laplacians. Foundations of Data Science, 5:558–588, 2023b.
- Spectra of combinatorial laplace operators on simplicial complexes. Advances in Mathematics, 244:303–336, 2013.
- Beno Eckmann. Harmonische funktionen und randwertaufgaben in einem komplex. Commentarii Mathematici Helvetici, 17(1):240–255, 1944.
- Persistent spectral graph. International journal for numerical methods in biomedical engineering, 36(9):e3376, 2020b.
- Persistent laplacians: Properties, algorithms and implications. SIAM Journal on Mathematics of Data Science, 4(2):858–884, 2022.
- Computing persistent homology. In Proceedings of the twentieth annual symposium on Computational geometry, pages 347–356, 2004.
- Topological persistence and simplification. Discrete & Computational Geometry, 28:511–533, 2002.
- The algebraic stability for persistent laplacians. arXiv preprint arXiv:2302.03902, 2023.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2, 2019.
- Alphafold and implications for intrinsically disordered proteins. Journal of Molecular Biology, 433(20):167208, 2021.
- Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Dimitry Kozlov. Combinatorial algebraic topology, volume 21. Springer Science & Business Media, 2008.
- László Lovász. Kneser’s conjecture, chromatic number, and homotopy. Journal of Combinatorial Theory, Series A, 25(3):319–324, 1978.
- Franz Aurenhammer. Voronoi diagrams-a survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR), 23(3):345–405, 1991.
- S Geršgorin. Bulletin de l’académie des sciences de l’urss. Classe des sciences mathématiques et naturelles, 6:749, 1931.
- Atom-specific persistent homology and its application to protein flexibility analysis. Computational and mathematical biophysics, 8(1):1–35, 2020.
- Topologynet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions. PLoS computational biology, 13(7):e1005690, 2017.
- Persistent cohomology for data with multicomponent heterogeneous information. SIAM journal on mathematics of data science, 2(2):396–418, 2020.
- Persistent sheaf laplacians. Foundations of Data Science (accepted). arXiv preprint arXiv:2112.10906, 2021.