Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+ (2303.16982v2)

Published 16 Mar 2023 in physics.chem-ph and cs.LG

Abstract: Recent developments in deep learning have made remarkable progress in speeding up the prediction of quantum chemical (QC) properties by removing the need for expensive electronic structure calculations like density functional theory. However, previous methods learned from 1D SMILES sequences or 2D molecular graphs failed to achieve high accuracy as QC properties primarily depend on the 3D equilibrium conformations optimized by electronic structure methods, far different from the sequence-type and graph-type data. In this paper, we propose a novel approach called Uni-Mol+ to tackle this challenge. Uni-Mol+ first generates a raw 3D molecule conformation from inexpensive methods such as RDKit. Then, the raw conformation is iteratively updated to its target DFT equilibrium conformation using neural networks, and the learned conformation will be used to predict the QC properties. To effectively learn this update process towards the equilibrium conformation, we introduce a two-track Transformer model backbone and train it with the QC property prediction task. We also design a novel approach to guide the model's training process. Our extensive benchmarking results demonstrate that the proposed Uni-Mol+ significantly improves the accuracy of QC property prediction in various datasets. We have made the code and model publicly available at \url{https://github.com/dptech-corp/Uni-Mol}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Graph convolutions that can finally model local structure. arXiv preprint arXiv:2011.15069, 2020.
  2. Open catalyst 2020 (oc20) dataset and community challenges. Acs Catalysis, 11(10):6059–6072, 2021.
  3. Gemnet: Universal directional graph neural networks for molecules. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 6790–6802, 2021.
  4. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. In Machine Learning for Molecules Workshop, NeurIPS, 2020.
  5. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
  6. Simple GNN regularisation for 3d molecular property prediction and beyond. In International Conference on Learning Representations, 2022.
  7. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
  8. Energy-inspired molecular conformation optimization. In International Conference on Learning Representations, 2022.
  9. OGB-LSC: A large-scale challenge for machine learning on graphs. In Joaquin Vanschoren and Sai-Kit Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021.
  10. Global self-attention as a replacement for graph convolution. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 655–665, 2022.
  11. Lietransformer: Equivariant self-attention for lie groups. In International Conference on Machine Learning, pages 4533–4543. PMLR, 2021.
  12. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL materials, 1(1):011002, 2013.
  13. Robert O Jones. Density functional theory: Its origins, rise to prominence, and future. Reviews of modern physics, 87(3):897, 2015.
  14. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  15. Pure transformers are powerful graph learners. arXiv preprint arXiv:2207.02505, 2022.
  16. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  17. Directional message passing for molecular graphs 2020. arXiv preprint arXiv:2003.03123, 2003.
  18. Greg Landrum et al. Rdkit: Open-source cheminformatics software. 2016.
  19. Deepergcn: All you need to train deeper gcns. arXiv preprint arXiv:2006.07739, 2020.
  20. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv preprint arXiv:2206.11990, 2022.
  21. Gem-2: Next generation molecular property prediction network with many-body and full-range interaction modeling. arXiv preprint arXiv:2208.05863, 2022.
  22. Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021.
  23. Spherical message passing for 3d molecular graphs. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  24. One transformer can understand both 2d & 3d molecular data. arXiv preprint arXiv:2210.01765, 2022.
  25. Your transformer may not be as powerful as you expect. arXiv preprint arXiv:2205.13401, 2022.
  26. Gps++: An optimised hybrid mpnn/transformer for molecular property prediction. arXiv preprint arXiv:2212.02229, 2022.
  27. Grpe: Relative positional encoding for graph transformer. In ICLR2022 Machine Learning for Drug Discovery, 2022.
  28. Recipe for a general, powerful, scalable graph transformer. arXiv preprint arXiv:2205.12454, 2022.
  29. Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence, 4(12):1256–1264, 2022.
  30. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems, 30, 2017.
  31. Benchmarking graphormer on large-scale molecular modeling datasets, 2022.
  32. 3d infomax improves gnns for molecular property prediction. In International Conference on Machine Learning, pages 20479–20502. PMLR, 2022.
  33. Equivariant transformers for neural network based molecular potentials. In International Conference on Learning Representations, 2022.
  34. Dr-label: Improving gnn models for catalysis systems by label deconstruction and reconstruction. arXiv preprint arXiv:2303.02875, 2023.
  35. Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pages 429–436, 2019.
  36. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
  37. Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34:28877–28888, 2021.
  38. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Physical review letters, 120(14):143001, 2018.
  39. Uni-mol: A universal 3d molecular representation learning framework. In The Eleventh International Conference on Learning Representations, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shuqi Lu (8 papers)
  2. Zhifeng Gao (37 papers)
  3. Di He (108 papers)
  4. Linfeng Zhang (160 papers)
  5. Guolin Ke (43 papers)
Citations (15)

Summary

Overview of Uni-Mol+: Highly Accurate Quantum Chemical Property Prediction

The paper "Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+" introduces a novel approach to quantum chemical (QC) property prediction that circumvents the computationally expensive density functional theory (DFT) calculations. This method, Uni-Mol+, represents a significant advance in leveraging deep learning techniques to improve the accuracy of QC predictions by utilizing 3D molecular structures.

Background and Motivation

Traditional methods for QC property calculation, such as DFT, require considerable computational resources and time, making them impractical for high-throughput applications. Recent deep learning models have aimed to address this issue using 1D SMILES notations or 2D molecular graphs. However, these methods fall short in accuracy as they do not capture the essential 3D equilibrium conformations that critically influence QC properties.

Uni-Mol+ addresses this gap by directly targeting the 3D conformational optimization towards DFT-level accuracy, without the need for the actual DFT computation during inference. This approach is achieved through an innovative combination of deep learning techniques.

Methodology

Uni-Mol+ employs a unique pipeline that begins with the generation of a raw 3D molecular structure using affordable methods such as RDKit. The core advancement lies in iteratively refining this raw conformation towards the target DFT-equilibrium conformation using a two-track Transformer model. This dual-track architecture maintains separate tracks for atom representations and pairwise interactions, facilitating accurate learning of the necessary corrections to the molecular conformation.

Furthermore, the authors introduce a novel training regime that samples pseudo-conformations between the raw and target states to better guide the learning process through a realistic trajectory of conformation updates.

Results and Implications

The model is rigorously tested on datasets such as PCQM4MV2 and Open Catalyst 2020 (OC20), demonstrating superior performance compared to existing state-of-the-art methods. Notably, Uni-Mol+ achieves an impressive reduction in Mean Absolute Error (MAE) across various datasets and tasks, underscoring its robustness and applicability to different QC property prediction problems.

The implications of this research are twofold:

  1. Practical Impact: By effectively emulating DFT-level predictions, Uni-Mol+ allows for faster and more cost-efficient high-throughput screening in molecular and material discovery.
  2. Theoretical Insights: The work provides insights into the effective modeling of 3D molecular structures using deep learning and illustrates the potential of iterative refinement strategies in bridging the gap between raw and equilibrium conformations.

Future Directions

This work opens several avenues for further exploration:

  • Generalization to Other QC Properties: Extending the methodology to other QC properties that are less reliant on DFT-like equilibrium conformations.
  • Inclusion of Conformational Dynamics: Considering more dynamic aspects of molecular structures, potentially integrating molecular dynamics simulations as part of the training data.
  • Improved Sampling Techniques: Enhancing the pseudo-conformation sampling strategy to capture more complex conformational landscapes.

In summary, Uni-Mol+ provides a compelling method for accurate and efficient QC property prediction, potentially transforming computational materials science by reducing dependence on traditional electronic structure calculations.

Github Logo Streamline Icon: https://streamlinehq.com