Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Predicting materials properties without crystal structure: Deep representation learning from stoichiometry (1910.00617v4)

Published 1 Oct 2019 in physics.comp-ph, cond-mat.mtrl-sci, and cs.LG

Abstract: Machine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure -- therefore only applicable to materials with already characterised structures -- or structure-agnostic fixed-length representations hand-engineered from the stoichiometry. We develop a machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data. Our key insight is to treat the stoichiometric formula as a dense weighted graph between elements. Compared to the state of the art for structure-agnostic methods, our approach achieves lower errors with less data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Rhys E. A. Goodall (9 papers)
  2. Alpha A. Lee (33 papers)
Citations (232)

Summary

Predicting Materials Properties Without Crystal Structure: Deep Representation Learning from Stoichiometry

The paper "Predicting Materials Properties Without Crystal Structure: Deep Representation Learning from Stoichiometry" presents an innovative approach leveraging ML to overcome challenges in materials discovery. Accurately predicting the properties of materials without the need for crystallographic information is a significant departure from traditional methodologies requiring data-intensive inputs. Here, the authors introduce a framework that uses stoichiometry alone, automating the development of descriptors via a dense, weighted graph representation.

Methodology

The authors reformulate the stoichiometric formula of materials into a dense weighted graph of elements. This innovative representation enables the use of message-passing neural networks (MPNNs) to derive descriptors automatically and systematically improve them with accessible data. This model, entitled "Roost," bypasses the bottleneck of current machine learning models that depend heavily on detailed structure data, a crucial limitation when exploring uncharted materials space. Importantly, Roost harnesses the stoichiometry-to-descriptor map without prior construction efforts, ushering materials discovery workflows into a new phase where computational explorations can preemptively trim experimental endeavors effectively.

Results and Performance

The authors substantiate their claims by comparing Roost with traditional models including ElemNet and Random Forest methods using Magpie descriptors. Results indicate higher sample efficiency and diminished errors for Roost, underscoring its robustness and suitability even with limited datasets, a frequent reality in experimental material science. For instance, Roost’s performance on the OQMD dataset displayed significant improvement in mean absolute error (MAE) and root mean square error (RMSE) relative to competing models.

Moreover, Roost translates experience in larger datasets, such as those generated from high-throughput ab initio workflows, to more focused data sets, confirming its versatility in transfer learning contexts. This aspect underscores Roost's potential to adaptively refine its descriptors in cognate and non-cognate tasks.

Another valuable feature of Roost is its capacity to generate reliable uncertainty estimates for model predictions. Through a robust technique combining aleatoric and epistemic uncertainty assessment via a deep ensemble approach, the model enhances prediction reliability, which is vital in exploratory applications where new materials are screened.

Implications and Future Prospects

Practically, the Roost model has tangible implications for accelerating new materials’ identification without exhaustive experimental or computational methodologies. Theoretically, this model serves as an illustration of utilizing ML techniques to simplify complex problems such as the stoichiometry-to-property map, often hindered by structural complexities.

In the broader landscape of AI and ML in material sciences, leveraging data primarily from simpler inputs like stoichiometry can markedly amplify the pace of innovation. Future research avenues could explore the integration of more robust probabilistic frameworks to further enhance the model’s reliability in providing uncertainties. Additionally, extending the approach to address the product prediction in inorganic reactions stands as another promising frontier.

This paper's contributions resonate as a significant step forward not just for its functional performance but also as an archetype for employing ML to unlock high-throughput explorations in heuristic-driven fields like materials discovery. As the database grows, this methodology’s efficacy is poised to escalate, highlighting the increasing confluence of data science and material science.