MOFormer: Transformer for MOF Screening

Updated 26 October 2025

MOFormer is a Transformer-based model that uses MOFid strings—integrating chemical and topological information—for rapid property prediction in metal-organic frameworks.
It employs self-supervised cross-modal pretraining with CGCNN embeddings, reducing prediction errors in tasks such as band gap and gas adsorption analysis.
The architecture bypasses 3D atomic coordinate requirements, offering a data-efficient, generative design tool for accelerated MOF screening and inverse materials discovery.

MOFormer refers to a class of Transformer-based neural architectures focused on efficient, structure-agnostic modeling and accelerated property prediction or generative design of metal-organic frameworks (MOFs). MOFormer approaches primarily operate on the MOFid string representation—a compact encoding of chemical composition and topology—rather than 3D atomic coordinates, enabling rapid screening and design across the vast MOF chemical space.

1. MOFormer Architecture and Input Encoding

The canonical MOFormer architecture is structured around the encoder module of the Transformer framework. Its input is the MOFid text string, constructed via concatenation of SMILES representations for secondary building units (SBUs) with RCSR topology codes, often separated by special tokens. This tokenization yields a unified sequence integrating both chemical bonding information and global framework topology. The sequence is mapped to embeddings, with a [CLS] token used to pool information for downstream regression or classification tasks.

Attention is computed as in standard Transformer encoders:

$\text{Attention} = \text{softmax}\left(\frac{QK^\top}{\sqrt{d_k}}\right) V$

where $Q$ , $K$ , and $V$ are the query, key, and value projections of the input embedding matrix $X$ . MOFormer stacks multiple such encoder layers, each followed by feed-forward blocks, residual connections, and layer normalization.

Distinctive features of MOFormer include its handling of MOFid tokens: chemical (SMILES) and topological (RCSR) tokens are processed distinctly, providing direct integration of sub-unit composition and connectivity.

A key methodological innovation is MOFormer’s self-supervised learning framework that couples its textual representations with structure-based embeddings learned from a graph model, Crystal Graph Convolutional Neural Network (CGCNN). Each MOF sample is encoded both by MOFormer (text-only) and CGCNN (3D graph-based), producing corresponding $512$-dimensional vectors. These are projected separately and their cross-correlation matrix $\mathbf{C}$ is computed:

$C_{ij} = \frac{\sum_b Z^{(A)}_{b,i} \cdot Z^{(B)}_{b,j}}{\sqrt{ \sum_b (Z^{(A)}_{b,i})^2 } \sqrt{ \sum_b (Z^{(B)}_{b,j})^2 }}$

where $Z^{(A)}$ and $Z^{(B)}$ are batched CGCNN and MOFormer embeddings. The pretraining objective is the Barlow Twins loss:

$L_{BT} = \sum_i (1- C_{ii})^2 + \lambda \sum_{i} \sum_{j \neq i} C_{ij}^2$

This loss encourages the diagonal of $C$ to unity (promoting shared information), while suppressing off-diagonal values, allowing MOFormer to learn structural/3D cues even though it is only exposed to text data during inference. This directly boosts downstream property prediction tasks.

3. Property Prediction and Downstream Performance

MOFormer achieves notable accuracy and efficiency in MOF property prediction:

Band gap prediction (QMOF dataset): When pretrained with the self-supervised coupled framework, MOFormer yields lower mean absolute error (MAE) than both featurization-based baselines (e.g., Stoichiometric-120) and even structure-based methods like SOAP, especially in low-data regimes (training size ≤ 1000).
Gas adsorption prediction (hMOF dataset): While CGCNN remains superior for adsorption (where explicit geometry matters), MOFormer is competitive and more data-efficient when labeled samples are scarce.

Across tasks, pretraining reduces MAE by 4-5% in MOFormer and CGCNN, substantiating the benefit of cross-modal pretraining. The approach is particularly advantageous for rapid screening in theoretical MOF space where explicit atomic geometry may not be available.

4. Accelerated MOF Screening and Generative Integration

MOFormer provides a reliable mechanism to bypass the computational and practical bottlenecks of traditional screening approaches (e.g., DFT simulation, structure optimization). The MOFid-based, structure-agnostic protocol allows MOFormer to generalize across hypothetical MOF candidates, accelerating the discovery pipeline for applications such as:

Gas storage and separation
Water desalination
Energy storage
Electronic property optimization

In generative frameworks like MOFGPT (Badrinarayanan et al., 30 May 2025), MOFormer serves as a “property predictor” in a reinforcement learning loop. Here, it assigns quantitative rewards to MOFid sequences generated by a GPT module, thereby steering generative search toward candidates with desired functional attributes. This underscores MOFormer’s dual utility as both a regression engine and an evaluative agent in inverse design workflows.

5. Data Efficiency and Conceptual Implications

MOFormer consistently exhibits superior data efficiency compared to graph-based models for quantum-chemical property prediction, especially in scenarios of limited labeled data. This is attributed to the “language-like” nature of MOFid encoding and the benefits of large-scale pretraining on >400k MOFs.

The paradigm exemplified by MOFormer reflects a broader trend in computational materials science: leveraging textual representations and cross-modal self-supervision to synthesize domain knowledge from heterogeneous data modalities (structural, topological, chemical). This offers new pathways for transfer learning, generative modeling, and efficient property prediction outside the constraints of exhaustive structure refinement.

6. Integration and Broader Applications

MOFormer has been integrated as a key component in multi-stage pipelines, including generative models augmented by reinforcement learning, property-guided reward modules, and high-throughput screening platforms. Its architecture—string-oriented, transformer-based, self-supervised—facilitates compatibility with foundation chemical LLMs, and its property outputs can be incorporated into composite reward functions balancing validity, novelty, and application-specific targets.

A plausible implication is that the MOFormer paradigm can be extended to other periodic materials or structure–property mapping tasks where string representations capture salient chemical/topological information at scale.

7. Future Directions

Potential future developments include:

Scaling MOFormer architectures using more powerful pretraining data, including multimodal and multi-property labels
Integration with generative design to further enable inverse materials discovery
Adaptation for novel structure descriptors (beyond MOFid) for even broader materials domains

This suggests MOFormer will remain central in the rapid screening, representation, and inverse design of reticular and framework materials, especially as language-model-centric approaches proliferate in computational chemistry.

PDF Markdown Chat (Pro)

References (1)

MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models (2025)

Follow Topic

Get notified by email when new papers are published related to MOFormer.