MOFClassifier: A Machine Learning Approach for Validating Computation-Ready Metal-Organic Frameworks (2506.14845v1)
Abstract: The computational discovery and design of new crystalline materials, particularly metal-organic frameworks (MOFs), heavily relies on high-quality, computation-ready structural data. However, recent studies have revealed significant error rates within existing MOF databases, posing a critical data problem that hinders efficient high-throughput computational screening. While rule-based algorithms like MOSAEC, MOFChecker, and the Chen and Manz method (Chen-Manz) have been developed to address this, they often suffer from inherent limitations and misclassification of structures. To overcome this challenge, we introduce MOFClassifier, a novel machine learning approach built upon a positive-unlabeled crystal graph convolutional neural network (PU-CGCNN) model. MOFClassifier learns intricate patterns from perfect crys-tal structures to predict a crystal-likeness score (CLscore), effectively classifying MOFs as computation-ready. Our model achieves a ROC value of 0.979 (previous best 0.912) and, importantly, can identify subtle structural and chemical errors that are fundamentally undetectable by current rule-based methods. By accurately recovering previously misclassified false-negative structures, MOFClassifier reduces the risk of overlooking promising material candidates in large-scale computational screening efforts. This user-friendly tool is freely available and has been integrated into the preparation workflow for the updated CoRE MOF DB 2025 v1, contributing to accelerated computational discovery of MOF materials.