SE(3)-Invariant Multiparameter Persistent Homology for Chiral-Sensitive Molecular Property Prediction (2312.07633v1)
Abstract: In this study, we present a novel computational method for generating molecular fingerprints using multiparameter persistent homology (MPPH). This technique holds considerable significance for drug discovery and materials science, where precise molecular property prediction is vital. By integrating SE(3)-invariance with Vietoris-Rips persistent homology, we effectively capture the three-dimensional representations of molecular chirality. This non-superimposable mirror image property directly influences the molecular interactions, serving as an essential factor in molecular property prediction. We explore the underlying topologies and patterns in molecular structures by applying Vietoris-Rips persistent homology across varying scales and parameters such as atomic weight, partial charge, bond type, and chirality. Our method's efficacy can be improved by incorporating additional parameters such as aromaticity, orbital hybridization, bond polarity, conjugated systems, as well as bond and torsion angles. Additionally, we leverage Stochastic Gradient Langevin Boosting in a Bayesian ensemble of GBDTs to obtain aleatoric and epistemic uncertainty estimates for gradient boosting models. With these uncertainty estimates, we prioritize high-uncertainty samples for active learning and model fine-tuning, benefiting scenarios where data labeling is costly or time consuming. Compared to conventional GNNs which usually suffer from oversmoothing and oversquashing, MPPH provides a more comprehensive and interpretable characterization of molecular data topology. We substantiate our approach with theoretical stability guarantees and demonstrate its superior performance over existing state-of-the-art methods in predicting molecular properties through extensive evaluations on the MoleculeNet benchmark datasets.
- Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18, 2017.
- Learning 3d representations of molecular chirality with invariance to bond rotations. In International Conference on Learning Representations, 2021.
- Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019.
- On the bottleneck of graph neural networks and its practical implications. arXiv preprint arXiv:2006.05205, 2020.
- Long-acting dihydropyridine calcium antagonists. 1. 2-alkoxymethyl derivatives incorporating basic substituents. Journal of medicinal chemistry, 29(9):1696–1702, 1986.
- On the stability of persistent entropy and new summary functions for topological data analysis. Pattern Recognition, 107:107509, 2020.
- Molecular machine learning with conformer ensembles. arXiv preprint arXiv:2012.08452, 2020.
- Paula Yurkanis Bruice. Organic chemistry. Pearson, 2017.
- Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3438–3445, 2020.
- Utilizing edge features in graph neural networks via variational information maximization. arXiv preprint arXiv:1906.05488, 2019.
- Persistence curves: A canonical framework for summarizing persistence diagrams. Advances in Computational Mathematics, 48(1):6, 2022.
- John S Delaney. Esol: estimating aqueous solubility directly from molecular structure. Journal of chemical information and computer sciences, 44(3):1000–1005, 2004.
- Todd: Topological compound fingerprinting in computer-aided drug discovery. Advances in Neural Information Processing Systems, 35:27978–27993, 2022.
- Multiparameter persistent homology for molecular property prediction. In ICLR 2023-Machine Learning for Drug Discovery workshop, 2023.
- Pavlo O Dral. Quantum chemistry in the age of machine learning. The journal of physical chemistry letters, 11(6):2336–2347, 2020.
- Convolutional networks on graphs for learning molecular fingerprints. Advances in neural information processing systems, 28, 2015.
- Computational topology: an introduction. American Mathematical Society, 2022.
- Machine learning prediction errors better than dft accuracy. arXiv preprint arXiv:1702.05532, 2017.
- Amlodipine in hypertension: a first-line agent with efficacy for improving blood pressure and patient outcomes. Open heart, 3(2):e000473, 2016.
- Potentialnet for molecular property prediction. ACS central science, 4(11):1520–1530, 2018.
- Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
- Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, 33:1970–1981, 2020.
- Geomol: Torsional geometric generation of molecular 3d conformer ensembles. Advances in Neural Information Processing Systems, 34:13757–13769, 2021.
- Structural basis of the modulation of the voltage-gated calcium ion channel cav1. 1 by dihydropyridine compounds. Angewandte Chemie, 133(6):3168–3174, 2021.
- Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems, 34:6790–6802, 2021.
- Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv preprint arXiv:2011.14115, 2020.
- Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123, 2020.
- A data-driven approach to predicting successes and failures of clinical trials. Cell chemical biology, 23(10):1294–1301, 2016.
- Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
- Determination of the absolute configuration of the active amlodipine enantiomer as (-)-s: a correction. Journal of medicinal chemistry, 35(18):3341–3344, 1992.
- A survey of topological machine learning methods. Frontiers in Artificial Intelligence, 4:52, 2021.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020.
- Hierarchical generation of molecular graphs using structural motifs. In ICML, pages 4839–4848. PMLR, 2020.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design, 30:595–608, 2016.
- Covariant compositional networks for learning graphs. arXiv preprint arXiv:1801.02144, 2018.
- A survey on graph kernels. Applied Network Science, 5(1):1–42, 2020.
- Spherical message passing for 3d graph networks. arXiv preprint arXiv:2102.05013, 2021.
- Interpretable chirality-aware graph neural network for quantitative structure activity relationship modeling in drug discovery. bioRxiv, pages 2022–08, 2022.
- Molecular property prediction: A multilevel quantum interactions modeling perspective. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1052–1060, 2019.
- Uncertainty in gradient boosting via ensembles. In International Conference on Learning Representations, 2020.
- A bayesian approach to in silico blood-brain barrier penetration modeling. Journal of chemical information and modeling, 52(6):1686–1697, 2012.
- Dragon software: An easy approach to molecular descriptor calculations. Match, 56(2):237–248, 2006.
- Learning to extend molecular scaffolds with structural motifs. arXiv preprint arXiv:2103.03864, 2021.
- Freesolv: a database of experimental and calculated hydration free energies, with input files. Journal of computer-aided molecular design, 28:711–720, 2014.
- Learning invariant representations of molecules for atomization energy prediction. Advances in neural information processing systems, 25, 2012.
- Adaptive node embedding propagation for semi-supervised classification. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part II 21, pages 417–433. Springer, 2021.
- William M Pardridge. The blood-brain barrier: bottleneck in brain drug development. NeuroRx, 2:3–14, 2005.
- Enhanced graph isomorphism network for molecular admet properties prediction. IEEE Access, 8:168344–168360, 2020.
- Extended-connectivity fingerprints. Journal of Chemical Information and Modeling, 50(5):742–754, 2010.
- Recent advances and applications of machine learning in solid-state materials science. npj Computational Materials, 5(1):83, 2019.
- Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems, 30, 2017.
- Learning gradient fields for molecular conformation generation. In International Conference on Machine Learning, pages 9558–9568. PMLR, 2021.
- Wasserstein stability for persistence diagrams. arXiv preprint arXiv:2006.16824, 2020.
- Silas W Smith. Chiral toxicology: it’s the same thing… only different. Toxicological sciences, 110(1):4–30, 2009.
- 3d infomax improves gnns for molecular property prediction. In International Conference on Machine Learning, pages 20479–20502. PMLR, 2022.
- A deep learning approach to antibiotic discovery. Cell, 180(4):688–702, 2020.
- Computational modeling of β𝛽\betaitalic_β-secretase 1 (bace-1) inhibitors using ligand based approaches. Journal of chemical information and modeling, 56(10):1936–1949, 2016.
- Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
- Sglb: Stochastic gradient langevin boosting. In International Conference on Machine Learning, pages 10487–10496. PMLR, 2021.
- Graph attention networks. stat, 1050(20):10–48550, 2017.
- Efficient computation of persistent homology for cubical data. In Topological methods in data analysis and visualization II: theory, algorithms, and applications, pages 91–106. Springer, 2011.
- Drug-target interaction prediction with graph attention networks. arXiv preprint arXiv:2107.06099, 2021.
- Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
- Learning neural generative dynamics for molecular conformation generation. arXiv preprint arXiv:2102.10240, 2021.
- Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019.
- Correction to analyzing learned molecular representations for property prediction. Journal of Chemical Information and Modeling, 59(12):5304–5305, 2019.
- Graph contrastive learning with augmentations. Advances in neural information processing systems, 33:5812–5823, 2020.
- Lightgbm: An effective and scalable algorithm for prediction of chemical toxicity–application to the tox21 and mutagenicity data sets. Journal of chemical information and modeling, 59(10):4150–4158, 2019.
- Identifying structure–property relationships through smiles syntax analysis with self-attention mechanism. Journal of chemical information and modeling, 59(2):914–923, 2019.