Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DScribe: Library of Descriptors for Machine Learning in Materials Science (1904.08875v1)

Published 18 Apr 2019 in cond-mat.mtrl-sci and cs.LG

Abstract: DScribe is a software package for machine learning that provides popular feature transformations ("descriptors") for atomistic materials simulations. DScribe accelerates the application of machine learning for atomistic property prediction by providing user-friendly, off-the-shelf descriptor implementations. The package currently contains implementations for Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Function (ACSF) and Smooth Overlap of Atomic Positions (SOAP). Usage of the package is illustrated for two different applications: formation energy prediction for solids and ionic charge prediction for atoms in organic molecules. The package is freely available under the open-source Apache License 2.0.

Citations (517)

Summary

  • The paper presents DScribe, a software package implementing established descriptors to transform atomic configurations into ML-ready features.
  • It integrates methods like the Coulomb matrix, MBTR, and SOAP through a Python interface with efficient C/C++ routines for accurate material property predictions.
  • DScribe accelerates materials discovery by standardizing feature extraction, achieving competitive results in formation energy and partial charge predictions.

Overview of DScribe: Descriptors for Machine Learning in Materials Science

The paper "DScribe: Library of Descriptors for Machine Learning in Materials Science" presents a software package dedicated to facilitating machine learning applications in atomistic materials simulations. The primary contribution of DScribe is the implementation of well-known descriptors that serve as feature transformations for predicting material properties at the atomic level. The descriptors include Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Function (ACSF), and Smooth Overlap of Atomic Positions (SOAP).

Key Components and Features

DScribe is designed to accelerate the computational modeling of materials by allowing researchers to convert atomic structures into machine learnable features seamlessly. This is achieved through a Python interface with computational core routines implemented in efficient languages such as C and C++. The package is open-source, distributed under the Apache License 2.0, and supports integration with other common atomic simulation platforms like ASE.

The descriptors included address various aspects of material modeling:

  1. Coupled Matrix Methods: The package features Coulomb, Ewald sum, and sine matrices, crucial for encoding atomic interactions and periodicity in crystals.
  2. Tensor and Symmetry Approaches: MBTR and ACSF offer ways to model local and global atomic environments, considering spatial symmetries and interatomic angles.
  3. SOAP: Provides a sophisticated way to encode atomic environments through spherical harmonics and radial basis functions. It is particularly useful for tasks requiring rotationally invariant descriptors.

Numerical Results

The authors have demonstrated the applicability of DScribe on two distinct tasks:

  1. Formation Energy Prediction: Using the Open Quantum Materials Database (OQMD), the package achieves competitive results across a variety of descriptors, highlighting MBTR and SOAP as particularly effective when feature transformations are needed for diverse datasets.
  2. Partial Charge Prediction: Evaluating the SOAP and ACSF descriptors for predicting atomic partial charges in organic molecules, DScribe provides accurate predictions using machine learning models like kernel ridge regression.

Implications and Future Direction

The implications of DScribe are numerous. It simplifies the process of feature extraction in materials science, enabling rapid prototyping and testing of machine learning models without the need for custom coding each descriptor. This benefit is substantial for advancing data-driven materials discovery and optimization.

Theoretically, DScribe encourages the development of new descriptors and refinement of existing ones, providing a platform where researchers can contribute and expand the library. The future trajectory of DScribe could see its integration with more advanced machine learning frameworks and expansion to include emerging descriptors like those harnessing deep neural networks or graph-based models.

In conclusion, DScribe stands as a pivotal tool in the standardization and expansion of machine learning applications within materials science. Its open-source nature and comprehensive documentation further its mission to be a central repository for descriptor technologies, fostering collaboration in the scientific community. The paper should be of great interest to computational materials scientists and machine learning researchers looking to explore and leverage atomistic simulations for material property predictions.