Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

M$^2$Hub: Unlocking the Potential of Machine Learning for Materials Discovery (2307.05378v1)

Published 14 Jun 2023 in cond-mat.mtrl-sci and cs.LG

Abstract: We introduce M$2$Hub, a toolkit for advancing machine learning in materials discovery. Machine learning has achieved remarkable progress in modeling molecular structures, especially biomolecules for drug discovery. However, the development of machine learning approaches for modeling materials structures lag behind, which is partly due to the lack of an integrated platform that enables access to diverse tasks for materials discovery. To bridge this gap, M$2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results that cover the entire workflow. Specifically, the first release of M$2$Hub focuses on three key stages in materials discovery: virtual screening, inverse design, and molecular simulation, including 9 datasets that covers 6 types of materials with 56 tasks across 8 types of material properties. We further provide 2 synthetic datasets for the purpose of generative tasks on materials. In addition to random data splits, we also provide 3 additional data partitions to reflect the real-world materials discovery scenarios. State-of-the-art machine learning methods (including those are suitable for materials structures but never compared in the literature) are benchmarked on representative tasks. Our codes and library are publicly available at https://github.com/yuanqidu/M2Hub.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Physical review letters, 120(14):143001, 2018.
  2. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  3. Geometric deep learning on molecular representations. Nature Machine Intelligence, 3(12):1023–1032, 2021.
  4. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15):e2016239118, 2021. doi: 10.1073/pnas.2016239118. URL https://www.pnas.org/doi/full/10.1073/pnas.2016239118. bioRxiv 10.1101/622803.
  5. Geometric deep learning of rna structure. Science, 373(6558):1047–1051, 2021.
  6. Inverse molecular design using machine learning: Generative models for matter engineering. Science, 361(6400):360–365, 2018.
  7. Computational sustainability meets materials science. Nature Reviews Materials, 6(8):645–647, 2021.
  8. Recent advances and applications of machine learning in solid-state materials science. npj Computational Materials, 5(1):83, 2019.
  9. Deepmd-kit: A deep learning package for many-body potential energy representation and molecular dynamics. Computer Physics Communications, 228:178–184, 2018.
  10. A data ecosystem to support machine learning in materials science. MRS Communications, 9(4):1125–1133, 2019.
  11. Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm. npj Computational Materials, 6(1):138, 2020.
  12. Benchmark aflow data sets for machine learning. Integrating Materials and Manufacturing Innovation, 9:153–156, 2020.
  13. A survey of datasets, preprocessing, modeling mechanisms, and simulation tools based on ai for material analysis and discovery. Materials, 15(4):1428, 2022.
  14. The liverpool materials discovery server: A suite of computational tools for the collaborative discovery of materials. 2023.
  15. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters, 120(14):145301, 2018.
  16. Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of Materials, 31(9):3564–3572, 2019.
  17. Atomistic line graph neural network for improved materials property predictions. npj Computational Materials, 7(1):185, 2021.
  18. e3nn: Euclidean neural networks. arXiv preprint arXiv:2207.09453, 2022.
  19. Ani-1: an extensible neural network potential with dft accuracy at force field computational cost. Chemical science, 8(4):3192–3203, 2017.
  20. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):1–11, 2022.
  21. Molgensurvey: A systematic survey in machine learning models for molecule design. arXiv preprint arXiv:2203.14500, 2022a.
  22. Accelerating material design with the generative toolkit for scientific discovery. npj Computational Materials, 9(1):69, 2023.
  23. Uspex—evolutionary crystal structure prediction. Computer physics communications, 175(11-12):713–720, 2006.
  24. Chris J Pickard and RJ Needs. Ab initio random structure searching. Journal of Physics: Condensed Matter, 23(5):053201, 2011.
  25. Data mined ionic substitutions for the discovery of new compounds. Inorganic chemistry, 50(2):656–663, 2011.
  26. Data-driven learning of total and local energies in elemental boron. Physical review letters, 120(15):156001, 2018.
  27. Data-driven approach to encoding and decoding 3-d crystal structures. arXiv preprint arXiv:1909.00949, 2019.
  28. Inverse design of solid-state materials via a continuous representation. Matter, 1(5):1370–1384, 2019.
  29. High-throughput discovery of novel cubic crystal materials using deep generative neural networks. Advanced Science, 8(20):2100566, 2021.
  30. Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. Advances in neural information processing systems, 32, 2019.
  31. Crystal diffusion variational autoencoder for periodic material generation. In International Conference on Learning Representations, 2022.
  32. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL materials, 1(1):011002, 2013.
  33. High-throughput predictions of metal–organic framework electronic properties: theoretical challenges, graph neural networks, and data exploration. npj Computational Materials, 8(1):112, 2022.
  34. Organic materials database: An open-access online database for data mining. PloS one, 12(2):e0171501, 2017.
  35. The joint automated repository for various integrated simulations (jarvis) for data-driven materials design. npj computational materials, 6(1):173, 2020.
  36. Open catalyst 2020 (oc20) dataset and community challenges. Acs Catalysis, 11(10):6059–6072, 2021.
  37. tmqm dataset—quantum geometries and properties of 86k transition metal complexes. Journal of chemical information and modeling, 60(12):6135–6146, 2020.
  38. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014.
  39. Chris J. Pickard. Airss data for carbon at 10gpa and the c+n+h+o system at 1gpa, 2020. URL https://archive.materialscloud.org/record/2020.0026/v1.
  40. New cubic perovskites for one-and two-photon water splitting using the computational materials repository. Energy & Environmental Science, 5(10):9034–9043, 2012a.
  41. Computational screening of perovskite metal oxides for optimal solar light capture. Energy & Environmental Science, 5(2):5814–5819, 2012b.
  42. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 68:314–319, 2013.
  43. 3-d inorganic crystal structure generation and property prediction via representation learning. Journal of Chemical Information and Modeling, 60(10):4518–4535, 2020.
  44. Local structure order parameters and site fingerprints for quantification of coordination environment and crystal structure similarity. RSC advances, 10(10):6063–6081, 2020.
  45. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials, 2(1):1–7, 2016.
  46. Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development. Advances in neural information processing systems, 2021.
  47. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  48. Machine learning of accurate energy-conserving molecular force fields. Science advances, 3(5):e1603015, 2017.
  49. Schnet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018.
  50. E (n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
  51. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv preprint arXiv:2011.14115, 2020.
  52. Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems, 34:6790–6802, 2021.
  53. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs.
  54. Efficient and expressive equivariant graph neural networks. under review, 2023.
  55. Se (3) equivariant graph neural networks with complete local frames. In International Conference on Machine Learning, pages 5583–5608. PMLR, 2022b.
  56. Matminer: An open source toolkit for materials data mining. Computational Materials Science, 152:60–69, 2018.
  57. The materials simulation toolkit for machine learning (mast-ml): An automated open source toolkit to accelerate data-driven materials research. Computational Materials Science, 176:109544, 2020.
  58. A systematic survey of chemical pre-trained models.
  59. Molecular graph contrastive learning with parameterized explainable augmentations. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1558–1563. IEEE, 2021a.
  60. Molcloze: a unified cloze-style self-supervised molecular structure learning model for chemical property prediction. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2896–2903. IEEE, 2021b.
  61. Density of states prediction for materials discovery via contrastive learning from probabilistic embeddings. Nature communications, 13(1):949, 2022.
  62. Xtal2dos: Attention-based crystal to sequence learning for density of states prediction. arXiv preprint arXiv:2302.01486, 2023.
  63. Automating crystal-structure phase mapping by combining deep learning with constraint reasoning. Nature Machine Intelligence, 3(9):812–822, 2021.
  64. Autonomous materials synthesis via hierarchical active learning of nonequilibrium phase diagrams. Science Advances, 7(51):eabg4930, 2021.
  65. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
  66. Evgeny Blokhin and P Villars. Materials platform for data science: from big data towards materials genome. 2019.
  67. Crystallography open database (cod): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic acids research, 40(D1):D420–D427, 2012.
  68. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (oqmd). Jom, 65:1501–1509, 2013.
  69. Mariette Hellenbrandt. The inorganic crystal structure database (icsd)—present and future. Crystallography Reviews, 10(1):17–22, 2004.
  70. The Cambridge Structural Database. Acta Crystallographica Section B, 72(2):171–179, Apr 2016. doi: 10.1107/S2052520616003954. URL https://doi.org/10.1107/S2052520616003954.
  71. The nomad laboratory: from data sharing to artificial intelligence. Journal of Physics: Materials, 2(3):036001, may 2019. doi: 10.1088/2515-7639/ab13bb. URL https://dx.doi.org/10.1088/2515-7639/ab13bb.
  72. Materials cloud, a platform for open computational science. Scientific Data, 7(1):299, Sep 2020. ISSN 2052-4463. doi: 10.1038/s41597-020-00637-5. URL https://doi.org/10.1038/s41597-020-00637-5.
  73. Fair data enabling new horizons for materials research. Nature, 604(7907):635–642, Apr 2022. ISSN 1476-4687. doi: 10.1038/s41586-022-04501-x. URL https://doi.org/10.1038/s41586-022-04501-x.
  74. Aflow: An automatic framework for high-throughput materials discovery. Computational Materials Science, 58:218–226, 2012. ISSN 0927-0256. doi: https://doi.org/10.1016/j.commatsci.2012.02.005. URL https://www.sciencedirect.com/science/article/pii/S0927025612000717.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yuanqi Du (52 papers)
  2. Yingheng Wang (16 papers)
  3. Yining Huang (11 papers)
  4. Jianan Canal Li (6 papers)
  5. Yanqiao Zhu (45 papers)
  6. Tian Xie (77 papers)
  7. Chenru Duan (28 papers)
  8. John M. Gregoire (10 papers)
  9. Carla P. Gomes (54 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub