Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning (2306.01474v6)
Abstract: Many processes in biology and drug discovery involve various 3D interactions between molecules, such as protein and protein, protein and small molecule, etc. Given that different molecules are usually represented in different granularity, existing methods usually encode each type of molecules independently with different models, leaving it defective to learn the various underlying interaction physics. In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics. To be specific, GET consists of a bilevel attention module, a feed-forward module and a layer normalization module, where each module is E(3) equivariant and specialized for handling sets of variable sizes. Notably, in contrast to conventional pooling-based hierarchical models, our GET is able to retain fine-grained information of all levels. Extensive experiments on the interactions between proteins, small molecules and RNA/DNAs verify the effectiveness and generalization capability of our proposed method across different domains.
- The rosetta all-atom energy function for macromolecular modeling and design. Journal of chemical theory and computation, 13(6):3031–3048, 2017.
- N. Anand and T. Achim. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
- Geometric deep learning on molecular representations. Nature Machine Intelligence, 3(12):1023–1032, 2021.
- Effective gene expression prediction from sequence by integrating long-range interactions. Nature methods, 18(10):1196–1203, 2021.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics, 26(9):1169–1175, 2010.
- T. Bepler and B. Berger. Learning protein sequence embeddings using information from structure. arXiv preprint arXiv:1902.08661, 2019.
- Pearson correlation coefficient. Noise reduction in speech processing, pages 1–4, 2009.
- Insights into protein–ligand interactions: mechanisms, models, and methods. International journal of molecular sciences, 17(2):144, 2016.
- A. A. Elfiky. Anti-hcv, nucleotide inhibitors, repurposing against covid-19. Life sciences, 248:117477, 2020.
- Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):7112–7127, 2022. doi: 10.1109/TPAMI.2021.3095381.
- M. Fey and J. E. Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
- Glide: a new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy. Journal of medicinal chemistry, 47(7):1739–1749, 2004.
- Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, 17(2):184–192, 2020.
- Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv preprint arXiv:2011.14115, 2020a.
- Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123, 2020b.
- De novo molecular generation via connection-aware motif mining. arXiv preprint arXiv:2302.01129, 2023.
- Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
- The statistical-thermodynamic basis for computation of binding affinities: a critical review. Biophysical journal, 72(3):1047–1069, 1997.
- P. J. Hajduk and J. Greer. A decade of fragment-based drug design: strategic advances and lessons learned. Nature reviews Drug discovery, 6(3):211–219, 2007.
- J. Hauke and T. Kossowski. Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaestiones geographicae, 30(2):87–93, 2011.
- S. Henikoff and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, 89(22):10915–10919, 1992.
- Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures. arXiv preprint arXiv:2007.06252, 2020.
- Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pages 8867–8887. PMLR, 2022.
- Equivariant graph mechanics networks with constraints. arXiv preprint arXiv:2203.06442, 2022.
- K deep: protein–ligand absolute binding affinity prediction via 3d-convolutional neural networks. Journal of chemical information and modeling, 58(2):287–296, 2018.
- Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pages 2323–2332. PMLR, 2018.
- Antibody-antigen docking and design via hierarchical equivariant refinement. arXiv preprint arXiv:2207.06616, 2022.
- Equivariant graph neural networks for 3d macromolecular structure. arXiv preprint arXiv:2106.03843, 2021.
- S. Jones and J. M. Thornton. Principles of protein-protein interactions. Proceedings of the National Academy of Sciences, 93(1):13–20, 1996.
- A structure-based benchmark for protein–protein binding affinity. Protein Science, 20(3):482–491, 2011.
- Conditional antibody design as 3d equivariant graph translation. arXiv preprint arXiv:2208.06073, 2022a.
- Molecule generation by principal subgraph mining and assembling. Advances in Neural Information Processing Systems, 35:2550–2563, 2022b.
- Y.-L. Liao and T. Smidt. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv preprint arXiv:2206.11990, 2022.
- Generating 3d molecules for target protein binding. In International Conference on Machine Learning, pages 13912–13924. PMLR, 2022.
- Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS computational biology, 17(8):e1009284, 2021.
- Pdb-wide collection of binding data: current status of the pdbbind database. Bioinformatics, 31(3):405–412, 2015.
- A 3d generative model for structure-based drug design. Advances in Neural Information Processing Systems, 34:6229–6239, 2021.
- Antigen-specific antibody design and optimization with diffusion-based generative models. bioRxiv, pages 2022–07, 2022.
- Rotamer density estimator is an unsupervised learner of the effect of mutations on protein-protein interaction. bioRxiv, pages 2023–02, 2023.
- Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In International Conference on Machine Learning, pages 16990–17017. PMLR, 2022.
- Deepdta: deep drug–target binding affinity prediction. Bioinformatics, 34(17):i821–i829, 2018.
- Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In International Conference on Machine Learning, pages 17644–17655. PMLR, 2022.
- Protein–ligand scoring with convolutional neural networks. Journal of chemical information and modeling, 57(4):942–957, 2017.
- Evaluating protein transfer learning with tape. Advances in neural information processing systems, 32, 2019.
- J. S. Richardson. The anatomy and taxonomy of protein structure. Advances in protein chemistry, 34:167–339, 1981.
- Current progress and open challenges for applying deep learning across the biosciences. Nature Communications, 13(1):1728, 2022.
- E (n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
- Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems, 30, 2017.
- Protein sequence and structure co-design with equivariant translation. arXiv preprint arXiv:2210.08761, 2022.
- Multi-scale representation learning on proteins. Advances in Neural Information Processing Systems, 34:25244–25255, 2021.
- 3d infomax improves gnns for molecular property prediction. In International Conference on Machine Learning, pages 20479–20502. PMLR, 2022.
- M. Steinegger and J. Söding. Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11):1026–1028, 2017.
- P. Thölke and G. De Fabritiis. Torchmd-net: equivariant transformers for neural network based molecular potentials. arXiv preprint arXiv:2202.02541, 2022.
- J. Tomasi and M. Persico. Molecular interactions in solution: an overview of methods based on continuous distributions of the solvent. Chemical Reviews, 94(7):2027–2094, 1994.
- Atom3d: Tasks on molecules in three dimensions. arXiv preprint arXiv:2012.04035, 2020.
- The open catalyst 2022 (oc22) dataset and challenges for oxide electrocatalysts. ACS Catalysis, 13(5):3066–3084, 2023.
- Applications of machine learning in drug discovery and development. Nature reviews Drug discovery, 18(6):463–477, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. Journal of molecular biology, 427(19):3031–3041, 2015.
- Learning protein representations via complete 3d graph networks. arXiv preprint arXiv:2207.12600, 2022.
- The pdbbind database: Collection of binding affinities for protein- ligand complexes with known three-dimensional structures. Journal of medicinal chemistry, 47(12):2977–2980, 2004.
- The structure of dna. In Cold Spring Harbor symposia on quantitative biology, volume 18, pages 123–131. Cold Spring Harbor Laboratory Press, 1953.
- Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
- Pre-training via denoising for molecular property prediction. arXiv preprint arXiv:2206.00133, 2022.
- E3bind: An end-to-end equivariant network for protein-ligand docking. arXiv preprint arXiv:2210.06069, 2022.
- Xiangzhe Kong (9 papers)
- Wenbing Huang (95 papers)
- Yang Liu (2253 papers)