Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's Rotation Equation (2301.10814v2)
Abstract: Protein-ligand binding prediction is a fundamental problem in AI-driven drug discovery. Prior work focused on supervised learning methods using a large set of binding affinity data for small molecules, but it is hard to apply the same strategy to other drug classes like antibodies as labelled data is limited. In this paper, we explore unsupervised approaches and reformulate binding energy prediction as a generative modeling task. Specifically, we train an energy-based model on a set of unlabelled protein-ligand complexes using SE(3) denoising score matching and interpret its log-likelihood as binding affinity. Our key contribution is a new equivariant rotation prediction network called Neural Euler's Rotation Equations (NERE) for SE(3) score matching. It predicts a rotation by modeling the force and torque between protein and ligand atoms, where the force is defined as the gradient of an energy function with respect to atom coordinates. We evaluate NERE on protein-ligand and antibody-antigen binding affinity prediction benchmarks. Our model outperforms all unsupervised baselines (physics-based and statistical potentials) and matches supervised learning methods in the antibody case.
- The rosetta all-atom energy function for macromolecular modeling and design. Journal of chemical theory and computation, 13(6):3031–3048, 2017.
- Firedock: fast interaction refinement in molecular docking. Proteins: Structure, Function, and Bioinformatics, 69(1):139–159, 2007.
- Transformercpi: improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics, 36(16):4406–4414, 2020a.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020b.
- Diffdock: Diffusion steps, twists, and turns for molecular docking. arXiv preprint arXiv:2210.01776, 2022.
- Converging a knowledge-based scoring function: Drugscore2018. Journal of chemical information and modeling, 59(1):509–521, 2018.
- Autodock vina 1.2.0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling, 61(8):3891–3898, 2021.
- Extra precision glide: Docking and scoring incorporating a model of hydrophobic enclosure for protein- ligand complexes. Journal of medicinal chemistry, 49(21):6177–6196, 2006.
- Independent SE(3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786, 2021.
- Prediction and scoring of docking poses with pydock. Proteins: Structure, Function, and Bioinformatics, 69(4):852–858, 2007.
- Learning inverse folding from millions of predicted structures. In International Conference on Machine Learning, pages 8946–8970. PMLR, 2022.
- Skempi 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics, 35(3):462–469, 2019.
- Interactiongraphnet: A novel and efficient deep graph representation learning framework for accurate protein–ligand interaction predictions. Journal of medicinal chemistry, 64(24):18209–18232, 2021.
- W. Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 32(5):922–923, 1976.
- Denoising diffusion probabilistic models on so (3) for rotational alignment. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
- A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.
- T. Lei. When attention meets fast recurrence: Training language models with reduced compute. arXiv preprint arXiv:2102.12459, 2021.
- Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022.
- Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction. bioRxiv, 2022.
- Language models enable zero-shot prediction of the effects of mutations on protein function. Advances in Neural Information Processing Systems, 34:29287–29303, 2021.
- Mmpbsa.py: an efficient program for end-state free energy calculations. Journal of chemical theory and computation, 8(9):3314–3321, 2012.
- Ccharppi web server: computational characterization of protein–protein interactions from structure. Bioinformatics, 31(1):123–125, 2015.
- Csm-ab: graph-based antibody–antigen binding affinity prediction and docking scoring function. Bioinformatics, 38(4):1141–1143, 2022.
- B. Pierce and Z. Weng. Zrank: reranking protein docking predictions with an optimized energy function. Proteins: Structure, Function, and Bioinformatics, 67(4):1078–1086, 2007.
- B. Pierce and Z. Weng. A combination of rescoring and refinement significantly improves protein docking performance. Proteins: Structure, Function, and Bioinformatics, 72(1):270–279, 2008.
- Zdock server: interactive docking prediction of protein–protein complexes and symmetric multimers. Bioinformatics, 30(12):1771–1773, 2014.
- Scoring by intermolecular pairwise propensities of exposed residues (sipper): a new efficient potential for protein- protein docking. Journal of chemical information and modeling, 51(2):370–377, 2011.
- Frame averaging for invariant and equivariant network design. arXiv preprint arXiv:2110.03336, 2021.
- Protein–ligand scoring with convolutional neural networks. Journal of chemical information and modeling, 57(4):942–957, 2017.
- D. Ravikant and R. Elber. Pie—efficient filters and coarse grained potentials for unbound protein–protein docking. Proteins: Structure, Function, and Bioinformatics, 78(2):400–419, 2010.
- Sabdab in the age of biotherapeutics: updates including sabdab-nano, the nanobody structure tracker. Nucleic acids research, 50(D1):D1368–D1372, 2022.
- Multi-scale representation learning on proteins. Advances in Neural Information Processing Systems, 34:25244–25255, 2021.
- Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
- Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34:1415–1428, 2021.
- Equibind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning, pages 20503–20521. PMLR, 2022.
- Comparative assessment of scoring functions: the casf-2016 update. Journal of chemical information and modeling, 59(2):895–913, 2018.
- Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
- Atom3d: Tasks on molecules in three dimensions. arXiv preprint arXiv:2012.04035, 2020.
- Improving ranking of models for protein complexes with side chain modeling and atomic potentials. Proteins: Structure, Function, and Bioinformatics, 81(4):592–606, 2013.
- Structure-aware multimodal deep learning for drug–protein interaction prediction. Journal of chemical information and modeling, 62(5):1308–1317, 2022.
- Ebm-fold: fully-differentiable protein folding powered by energy-based models. arXiv preprint arXiv:2105.04771, 2021.
- Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019.
- Planet: A multi-objective graph neural network model for protein-ligand binding affinity prediction. bioRxiv, pages 2023–02, 2023.