Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models (2405.16041v3)
Abstract: Providing explainable molecular property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based LLMs have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop a framework for explainable molecular property prediction based on LLMs, dubbed as Lamole, which can provide chemical concepts-aligned explanations. We take a string-based molecular representation -- Group SELFIES -- as input tokens to pretrain and fine-tune our Lamole, as it provides chemically meaningful semantics. By disentangling the information flows of Lamole, we propose combining self-attention weights and gradients for better quantification of each chemically meaningful substructure's impact on the model's output. To make the explanations more faithfully respect the structure-property relationship, we then carefully craft a marginal loss to explicitly optimize the explanations to be able to align with the chemists' annotations. We bridge the manifold hypothesis with the elaborated marginal loss to prove that the loss can align the explanations with the tangent space of the data manifold, leading to concept-aligned explanations. Experimental results over six mutagenicity datasets and one hepatotoxicity dataset demonstrate Lamole can achieve comparable classification accuracy and boost the explanation accuracy by up to 14.3%, being the state-of-the-art in explainable molecular property prediction.
- Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022.
- A systematic study of key elements underlying molecular property prediction. Nature Communications, 14(1):6395, 2023.
- Tanimoto random features for scalable molecular machine learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 33656–33686. Curran Associates, Inc., 2023.
- Accelerating molecular graph neural networks via knowledge distillation. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 25761–25792. Curran Associates, Inc., 2023.
- Diffusion-driven domain adaptation for generating 3d molecules. arXiv preprint arXiv:2404.00962, 2024.
- David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
- A review of molecular representation in the age of machine learning. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(5):e1603, 2022.
- Group selfies: a robust fragment-based molecular string representation. Digital Discovery, 2(3):748–758, 2023.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
- Chemberta: large-scale self-supervised pretraining for molecular property prediction. NeurIPS ML for Molecules Workshop, 2020.
- Orphicx: A causality-inspired latent variable model for interpreting graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13729–13738, 2022.
- Gnnexplainer: Generating explanations for graph neural networks. Advances in neural information processing systems, 32, 2019.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
- From black boxes to actionable insights: A perspective on explainable artificial intelligence for scientific discovery. Journal of Chemical Information and Modeling, 2023.
- A perspective on explanations of molecular prediction models. Journal of Chemical Theory and Computation, 19(8):2149–2160, 2023.
- Crysxpp: An explainable property predictor for crystalline materials. npj Computational Materials, 8(1):43, 2022.
- Explainable ai in drug discovery: self-interpretable graph neural network for molecular property prediction using concept whitening. Machine Learning, 113(4):2013–2044, 2024.
- Gnnexplainer: Generating explanations for graph neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Same: Uncovering gnn black box with structure-aware shapley-based multipiece explanations. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 6442–6466. Curran Associates, Inc., 2023.
- Generative causal explanations for graph neural networks. In International Conference on Machine Learning, pages 6666–6679. PMLR, 2021.
- Parameterized explainer for graph neural network. Advances in neural information processing systems, 33:19620–19631, 2020.
- Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pages 429–436, 2019.
- Chemberta-2: Towards chemical foundation models. arXiv preprint arXiv:2209.01712, 2022.
- Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence, 4(12):1256–1264, 2022.
- Attention is not explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3543–3556, 2019.
- Is attention interpretable? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2931–2951, 2019.
- Quantifying attention flow in transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4190–4197, 2020.
- Towards the unification and robustness of perturbation and gradient based explanations. In International Conference on Machine Learning, pages 110–119. PMLR, 2021.
- Sanity checks for saliency maps. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215, 2019.
- Attention is not not explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 11–20, 2019.
- Bernease Herman. The promise and peril of human evaluation for model interpretability. arXiv preprint arXiv:1711.07414, 2017.
- Towards faithfully interpretable nlp systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205, 2020.
- The manifold hypothesis for gradient-based explanations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3696–3701, 2023.
- Simple GNN regularisation for 3d molecular property prediction and beyond. In International Conference on Learning Representations, 2022.
- Chemoverse: Manifold traversal of latent spaces for novel molecule discovery. European Conference on Artificial Intelligence Workshop, 2020.
- Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. Journal of medicinal chemistry, 34(2):786–797, 1991.
- Tudataset: A collection of benchmark datasets for learning with graphs. arXiv preprint arXiv:2007.08663, 2020.
- Statistical evaluation of the predictive toxicology challenge 2000–2001. Bioinformatics, 19(10):1183–1193, 2003.
- Data-driven identification of structural alerts for mitigating the risk of drug-induced human liver injuries. Journal of cheminformatics, 7:1–8, 2015.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2016.
- An end-to-end deep learning architecture for graph classification. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- edgnn: A simple and powerful gnn for directed labeled graphs. In International Conference on Learning Representations, 2019.
- How powerful are graph neural networks? In International Conference on Learning Representations, 2018.
- Random walk graph neural networks. Advances in Neural Information Processing Systems, 33:16211–16222, 2020.
- Dropgnn: Random dropouts increase the expressiveness of graph neural networks. Advances in Neural Information Processing Systems, 34:21997–22009, 2021.
- Invariant and equivariant graph networks. In International Conference on Learning Representations, 2018.
- V Sanh. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. In Proceedings of Thirty-third Conference on Neural Information Processing Systems (NIPS2019), 2019.
- Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations, 2020.
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
- Learning deep features for discriminative localization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2921–2929, 2016.
- Learning important features through propagating activation differences. In International conference on machine learning, pages 3145–3153. PMLR, 2017.
- Zinc: a free tool to discover chemistry for biology. Journal of chemical information and modeling, 52(7):1757–1768, 2012.