Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning (2402.13221v2)

Published 20 Feb 2024 in cs.LG and stat.ML

Abstract: Advances in graph ML have been driven by applications in chemistry as graphs have remained the most expressive representations of molecules. While early graph ML methods focused primarily on small organic molecules, recently, the scope of graph ML has expanded to include inorganic materials. Modelling the periodicity and symmetry of inorganic crystalline materials poses unique challenges, which existing graph ML methods are unable to address. Moving to inorganic nanomaterials increases complexity as the scale of number of nodes within each graph can be broad ($10$ to $105$). The bulk of existing graph ML focuses on characterising molecules and materials by predicting target properties with graphs as input. However, the most exciting applications of graph ML will be in their generative capabilities, which is currently not at par with other domains such as images or text. We invite the graph ML community to address these open challenges by presenting two new chemically-informed large-scale inorganic (CHILI) nanomaterials datasets: A medium-scale dataset (with overall >6M nodes, >49M edges) of mono-metallic oxide nanomaterials generated from 12 selected crystal types (CHILI-3K) and a large-scale dataset (with overall >183M nodes, >1.2B edges) of nanomaterials generated from experimentally determined crystal structures (CHILI-100K). We define 11 property prediction tasks and 6 structure prediction tasks, which are of special interest for nanomaterial research. We benchmark the performance of a wide array of baseline methods and use these benchmarking results to highlight areas which need future work. To the best of our knowledge, CHILI-3K and CHILI-100K are the first open-source nanomaterial datasets of this scale -- both on the individual graph level and of the dataset as a whole -- and the only nanomaterials datasets with high structural and elemental diversity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Characterising the atomic structure of mono-metallic nanoparticles from x-ray scattering data using conditional generative models. In Proceedings of the 16th International Workshop on Mining and Learning with Graphs (MLG).
  2. CEGANN: Crystal Edge Graph Attention Neural Network for multiscale classification of materials environment. npj Computational Materials 9, 1 (2023).
  3. Silver Nanoparticle Data Set.
  4. A foundation model for atomistic materials chemistry. (2024).
  5. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. Advances in Neural Information Processing Systems (NeurIPS) (2022).
  6. Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2023), 657–668.
  7. Geometric Deep Learning: Going beyond Euclidean data. IEEE Signal Processing Magazine 34, 4 (2017), 18–42.
  8. I.D. Brown. 1996. CIF (Crystallographic Information File). A standard for crystallographic data interchange. Journal of Research of the National Institute of Standards and Technology 101, 3 (1996), 341.
  9. Electronic Spectroscopy and Photophysics of Si Nanocrystals: Relationship to Bulk c-Si and Porous Si. Journal of the American Chemical Society 117, 10 (1995), 2915–2922.
  10. Cayley. 1874. LVII. On the mathematical theory of isomers. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 47, 314 (1874), 444–447.
  11. Open Catalyst 2020 (OC20) Dataset and Community Challenges. ACS Catalysis 11, 10 (2021), 6059–6072.
  12. Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. Chemistry of Materials 31, 9 (2019), 3564–3572.
  13. A geometric-information-enhanced crystal graph network for predicting properties of materials. Communications Materials 2, 1 (2021).
  14. Kamal Choudhary and Brian DeCost. 2021. Atomistic Line Graph Neural Network for improved materials property predictions. npj Computational Materials 7, 1 (2021).
  15. Large Scale Benchmark of Materials Design Methods. (2023).
  16. A Systematic Review of Metal Oxide Applications for Energy and Environmental Sustainability. Metals 10, 12 (2020), 1604.
  17. CrysGNN : Distilling pre-trained knowledge to enhance property prediction for crystalline materials.. In Workshop on ”Machine Learning for Materials” at ICLR 2023.
  18. Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models (2018).
  19. Robert T. Downs and Michelle Hall-Wallace. 2003. The American Mineralogist crystal structure database. American Mineralogist 88 (2003), 247–250.
  20. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In Advances in Neural Information Processing Systems, Vol. 28.
  21. Matthias Fey and Jan Eric Lenssen. 2019. ICML 2018 workshop on Representation Learning on Graphs and Manifolds (2019).
  22. Victor Fung and De-en Jiang. 2017. Exploring Structural Diversity and Fluxionality of Ptn (n = 10–13) Clusters from First-Principles. The Journal of Physical Chemistry C 121, 20 (2017), 10796–10802.
  23. Benchmarking graph neural networks for materials chemistry. npj Computational Materials 7, 1 (2021).
  24. Metal Oxide Nanomaterials for Environmental Applications. 2357–2368.
  25. Hongyang Gao and Shuiwang Ji. 2019. Graph U-nets. In International Conference on Machine Learning (ICML).
  26. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. 1263–1272.
  27. Examining graph neural networks for crystal structures: limitations and opportunities for capturing periodicity. Science Advances (2023).
  28. Rhys E. A. Goodall and Alpha A. Lee. 2020. Predicting materials properties without crystal structure: deep representation learning from stoichiometry. Nature Communications 11, 1 (2020).
  29. Crystallography Open Database – an open-access collection of crystal structures. Journal of Applied Crystallography 42, 4 (2009), 726–729.
  30. Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Research 40, D1 (2011), D420–D427.
  31. Computing stoichiometric molecular composition from crystal structures. Journal of Applied Crystallography 48, 1 (2015), 85–91.
  32. The Cambridge Structural Database. Acta Crystallographica Section B Structural Science, Crystal Engineering and Materials 72, 2 (2016), 171–179.
  33. Inductive representation learning on large graphs. Advances in Neural Information Processing Systems (NeurIPS) (2017).
  34. William L. Hamilton. 2020. Graph Representation Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 14, 3 (2020), 1–159.
  35. The atomic simulation environment—a Python library for working with atoms. Journal of Physics: Condensed Matter 29, 27 (2017), 273002.
  36. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems, Vol. 33. 6840–6851.
  37. Open Graph Benchmark: Datasets for Machine Learning on Graphs. In Advances in Neural Information Processing Systems, Vol. 33. 22118–22133.
  38. Global Self-Attention as a Replacement for Graph Convolution. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
  39. Christian Igel and Stefan Oehmcke. 2023. Remember to Correct the Bias When Using Deep Learning for Regression! KI - Künstliche Intelligenz 37, 1 (2023), 33–40.
  40. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials 1, 1 (2013).
  41. ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition. Scientific Reports 8, 1 (2018).
  42. IRNet: A General Purpose Deep Residual Regression Framework for Materials Discovery. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
  43. A GPU-Accelerated Open-Source Python Package for Calculating Powder Diffraction, Small-Angle-, and Total Scattering with the Debye Scattering Equation. Journal of Open Source Software (2024).
  44. High-accuracy thermodynamic properties to the melting point from ab initio calculations aided by machine-learning potentials. npj Computational Materials 9, 1 (2023).
  45. Peter C. Jurs. 1971. Machine Intelligence Applied to Chemical Systems: A Graph Theoretical and Learning Machine Study of Second-Order Effects in Low Resolution Mass Spectra. Applied Spectroscopy 25, 4 (1971), 483–488.
  46. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR).
  47. Thomas N Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR).
  48. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Computational Materials 1, 1 (2015).
  49. DeepStruc: towards structure solution from pair distribution function data using deep generative models. Digital Discovery 2, 1 (2023), 69–80.
  50. Metal Oxide Particles and Their Prospects for Applications. 3–42.
  51. Deep learning. Nature 521, 7553 (2015), 436–444.
  52. MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials Modeling. In AI for Accelerated Materials Design - NeurIPS 2023 Workshop.
  53. Jure Leskovec and Julian Mcauley. 2012. Learning to Discover Social Circles in Ego Networks. In Advances in Neural Information Processing Systems, Vol. 25. Curran Associates, Inc.
  54. GraphEBM: Molecular Graph Generation with Energy-Based Models. In Energy Based Models Workshop - ICLR 2021.
  55. Graph convolutional neural networks with global attention for improved materials property prediction. Physical Chemistry Chemical Physics 22, 32 (2020), 18141–18148.
  56. A database of low-energy atomically precise nanoclusters. Scientific Data 10, 1 (2023).
  57. Łukasz Mentel. 2023. mendeleev - A Python package with properties of chemical elements, ions, isotopes and methods to manipulate and visualize periodic table.
  58. Scaling deep learning for materials discovery. Nature 624, 7990 (2023), 80–85.
  59. Christian Merkwirth and Thomas Lengauer. 2005. Automatic Generation of Complementary Descriptors with Molecular Graph Networks. Journal of Chemical Information and Modeling 45, 5 (2005), 1159–1168.
  60. COD::CIF::Parser: an error-correcting CIF parser for the Perl language. Journal of Applied Crystallography 49, 1 (2016), 292–301.
  61. Graph isomorphism-based algorithm for cross-checking chemical and crystallographic descriptions. Journal of Cheminformatics 15, 1 (2023).
  62. Koichi Momma and Fujio Izumi. 2008. VESTA: a three-dimensional visualization system for electronic and structural analysis. Journal of Applied Crystallography 41, 3 (2008), 653–658.
  63. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, Vol. 32.
  64. Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database. Journal of Cheminformatics 10, 1 (2018).
  65. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data 1, 1 (2014).
  66. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. Journal of Chemical Information and Modeling 52, 11 (2012), 2864–2875.
  67. The Graph Neural Network Model. IEEE Transactions on Neural Networks 20, 1 (2009), 61–80.
  68. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85–117.
  69. Crystal graph attention networks for the prediction of stable materials. Science Advances 7, 49 (2021).
  70. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in Neural Information Processing Systems (NeurIPS) (2017).
  71. J. C. Slater. 1964. Atomic Radii in Crystals. The Journal of Chemical Physics 41, 10 (1964), 3199–3204.
  72. Teague Sterling and John J. Irwin. 2015. ZINC 15 – Ligand Discovery for Everyone. Journal of Chemical Information and Modeling 55, 11 (2015), 2324–2337.
  73. IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation. (2023).
  74. The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts. ACS Catalysis 13, 5 (2023), 3066–3084.
  75. Validation of the Crystallography Open Database using the Crystallographic Information Framework. Journal of Applied Crystallography 54, 2 (2021), 661–672.
  76. A workflow for deriving chemical entities from crystallographic data and its application to the Crystallography Open Database. Journal of Cheminformatics 15, 1 (2023).
  77. Graph Attention Networks. In International Conference on Learning Representations (ICLR).
  78. DiGress: Discrete Denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations.
  79. Graph Neural Networks for Molecules. 21–66.
  80. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics (tog) (2019).
  81. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials 2, 1 (2016).
  82. Anthony R West. 2022. Solid State Chemistry and its Applications (2 ed.). John Wiley & Sons.
  83. MoleculeNet: a benchmark for molecular machine learning. Chemical science (2018).
  84. Crystal Diffusion Variational Autoencoder for Periodic Material Generation. In International Conference on Learning Representations (ICLR).
  85. Tian Xie and Jeffrey C. Grossman. 2018. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Physical Review Letters 120, 14 (2018).
  86. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations (ICLR).
  87. Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs. In International Conference on Learning Representations (ICLR).
  88. Big Data in a Nano World: A Review on Computational, Data-Driven Design of Nanomaterials Structures, Properties, and Synthesis. ACS Nano 16, 12 (2022), 19873–19891.
  89. Recent developments in the Inorganic Crystal Structure Database: theoretical crystal structure data and related features. Journal of Applied Crystallography 52, 5 (2019), 918–925.
  90. MatterGen: a generative model for inorganic materials design. (2023).
  91. Marinka Zitnik and Jure Leskovec. 2017. Predicting multicellular function through multi-layer tissue networks. Bioinformatics 33, 14 (2017), i190–i198.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ulrik Friis-Jensen (2 papers)
  2. Frederik L. Johansen (1 paper)
  3. Andy S. Anker (5 papers)
  4. Erik B. Dam (15 papers)
  5. Raghavendra Selvan (39 papers)
  6. Kirsten M. Ø. Jensen (4 papers)
Citations (1)