Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MOFormer: Self-Supervised Transformer model for Metal-Organic Framework Property Prediction (2210.14188v1)

Published 25 Oct 2022 in cs.LG and physics.chem-ph

Abstract: Metal-Organic Frameworks (MOFs) are materials with a high degree of porosity that can be used for applications in energy storage, water desalination, gas storage, and gas separation. However, the chemical space of MOFs is close to an infinite size due to the large variety of possible combinations of building blocks and topology. Discovering the optimal MOFs for specific applications requires an efficient and accurate search over an enormous number of potential candidates. Previous high-throughput screening methods using computational simulations like DFT can be time-consuming. Such methods also require optimizing 3D atomic structure of MOFs, which adds one extra step when evaluating hypothetical MOFs. In this work, we propose a structure-agnostic deep learning method based on the Transformer model, named as MOFormer, for property predictions of MOFs. The MOFormer takes a text string representation of MOF (MOFid) as input, thus circumventing the need of obtaining the 3D structure of hypothetical MOF and accelerating the screening process. Furthermore, we introduce a self-supervised learning framework that pretrains the MOFormer via maximizing the cross-correlation between its structure-agnostic representations and structure-based representations of crystal graph convolutional neural network (CGCNN) on >400k publicly available MOF data. Using self-supervised learning allows the MOFormer to intrinsically learn 3D structural information though it is not included in the input. Experiments show that pretraining improved the prediction accuracy of both models on various downstream prediction tasks. Furthermore, we revealed that MOFormer can be more data-efficient on quantum-chemical property prediction than structure-based CGCNN when training data is limited. Overall, MOFormer provides a novel perspective on efficient MOF design using deep learning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhonglin Cao (8 papers)
  2. Rishikesh Magar (13 papers)
  3. Yuyang Wang (111 papers)
  4. Amir Barati Farimani (121 papers)
Citations (70)

Summary

MOFormer: Self-Supervised Transformer Model for Metal-Organic Framework Property Prediction

The paper presents a novel approach to predicting the properties of Metal-Organic Frameworks (MOFs) using deep learning techniques. MOFs are crystalline materials known for their high porosity and versatile applications in areas such as energy storage, water desalination, gas storage, and separation. The challenge in exploiting MOFs lies in the enormous chemical space due to the countless combinations of building blocks and topologies, which necessitates efficient methods for property prediction.

Previous methods for predicting MOF properties have relied on computational simulations such as Density Functional Theory (DFT), which are time-consuming and require optimization of 3D atomic structures—adding significant computational overhead. The paper introduces MOFormer, a Transformer-based deep learning model, which circumvents the need for 3D structural information by using a text string representation of MOFs, termed MOFid. This innovation enables quicker exploration of potential MOF candidates by leveraging LLM architectures optimized for parsing sequences of data.

The MOFormer introduces a self-supervised learning framework that improves the model's capabilities by maximizing the cross-correlation between its structure-agnostic representations and the structure-based representations derived from a Crystal Graph Convolutional Neural Network (CGCNN). This pretraining involves over 400,000 MOF data points and allows MOFormer to implicitly learn 3D structural information even though it is not part of the direct input data. The pretraining enhances the prediction accuracy across various tasks, particularly when the availability of training data is limited.

The paper reports strong numerical results where MOFormer demonstrated significant accuracy improvements. For instance, in the quantum-chemical property prediction (band gap), MOFormer showed 21.2% lower Mean Absolute Error (MAE) compared with another structure-agnostic method, Stoichiometric-120. Furthermore, pretraining improved MOFormer by 5.34% in band gap prediction accuracy and by 4.3% in gas adsorption prediction. The CGCNN model also benefits from pretraining, showing a 6.79% reduction in MAE for band gap prediction and a 16.5% improvement for gas adsorption on various benchmarks.

One of the key insights from the paper is the model's ability to handle diverse chemical spaces and its efficiency in scenarios where training data is sparse. MOFormer was shown to outperform CGCNN in BAND gap prediction tasks with limited training data (≤1000 samples). In addition, attention visualizations revealed that MOFormer learns significant atomic and topological features in its representations, which is crucial for accurate property prediction.

From a theoretical standpoint, the paper highlights the significance of integrating self-supervised learning techniques in enhancing model robustness in scenarios with large and diverse datasets, without needing extensive labeled training data. Practically, MOFormer offers a robust tool for rapid evaluation of hypothetical MOFs, which could accelerate the development of new MOF materials optimized for specific applications.

Future research directions might explore further applications of the MOFormer model in other domains of materials science or the integration of additional architectural improvements to further enhance prediction accuracy and computational efficiency. Additionally, the expansion of training datasets to include more diverse combinations of MOF structures could enrich the model's learning and applicability. While the paper makes significant strides in MOF property prediction, continued advancements in self-supervised learning frameworks and deep learning architectures will be crucial for pushing the boundaries of materials discovery.