Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction (2312.09744v2)
Abstract: Using ML techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultaneously handle semantic and numerical information. In this paper, we propose a numerical reasoning method for material KGs (NR-KG), which constructs a cross-modal KG using semantic nodes and numerical proxy nodes. It captures both types of information by projecting KG into a canonical KG and utilizes a graph neural network to predict material properties. In this process, a novel projection prediction loss is proposed to extract semantic features from numerical information. NR-KG facilitates end-to-end processing of cross-modal data, mining relationships and cross-modal information in small-sample datasets, and fully utilizes valuable experimental data to enhance material prediction. We further propose two new High-Entropy Alloys (HEA) property datasets with semantic descriptions. NR-KG outperforms state-of-the-art (SOTA) methods, achieving relative improvements of 25.9% and 16.1% on two material datasets. Besides, NR-KG surpasses SOTA methods on two public physical chemistry molecular datasets, showing improvements of 22.2% and 54.3%, highlighting its potential application and generalizability. We hope the proposed datasets, algorithms, and pre-trained models can facilitate the communities of KG and AI for materials.
- L. Himanen, A. Geurts, A. S. Foster, and P. Rinke, “Data-driven materials science: status, challenges, and perspectives,” Advanced Science, vol. 6, no. 21, p. 1900808, 2019.
- J. Ma, J. Dai, X. Guo, D. Fu, L. Ma, P. Keil, A. Mol, and D. Zhang, “Data-driven corrosion inhibition efficiency prediction model incorporating 2d-3d molecular graphs and inhibitor concentration,” Corrosion Science, p. 111420, 2023.
- J. Dai, D. Fu, G. Song, L. Ma, X. Guo, A. Mol, I. Cole, and D. Zhang, “Cross-category prediction of corrosion inhibitor performance based on molecular graph structures via a three-level message passing neural network model,” Corrosion Science, vol. 209, p. 110780, 2022.
- C. H. Chan, M. Sun, and B. Huang, “Application of machine learning for advanced material prediction and design,” EcoMat, vol. 4, no. 4, p. e12194, 2022.
- S. A. Tawfik, O. Isayev, M. J. Spencer, and D. A. Winkler, “Predicting thermal properties of crystals using machine learning,” Advanced Theory and Simulations, vol. 3, no. 2, p. 1900208, 2020.
- C. Gao, X. Min, M. Fang, T. Tao, X. Zheng, Y. Liu, X. Wu, and Z. Huang, “Innovative materials science via machine learning,” Advanced Functional Materials, vol. 32, no. 1, p. 2108044, 2022.
- J. Wei, X. Chu, X.-Y. Sun, K. Xu, H.-X. Deng, J. Chen, Z. Wei, and M. Lei, “Machine learning in materials science,” InfoMat, vol. 1, no. 3, pp. 338–358, 2019.
- H. Li, J. Lin, X. Lei, and T. Wei, “Compressive strength prediction of basalt fiber reinforced concrete via random forest algorithm,” Materials Today Communications, vol. 30, p. 103117, 2022.
- D. Jahed Armaghani, P. G. Asteris, B. Askarian, M. Hasanipanah, R. Tarinejad, and V. V. Huynh, “Examining hybrid and single svm models with different kernels to predict rock brittleness,” Sustainability, vol. 12, no. 6, p. 2229, 2020.
- S. Feng, H. Zhou, and H. Dong, “Using deep neural network with small dataset to predict material defects,” Materials & Design, vol. 162, pp. 300–310, 2019.
- Y. Fang, Q. Zhang, N. Zhang, Z. Chen, X. Zhuang, X. Shao, X. Fan, and H. Chen, “Knowledge graph-enhanced molecular contrastive learning with functional prompt,” Nature Machine Intelligence, pp. 1–12, 2023.
- Y. Fang, Q. Zhang, H. Yang, X. Zhuang, S. Deng, W. Zhang, M. Qin, Z. Chen, X. Fan, and H. Chen, “Molecular contrastive learning with chemical element knowledge graph,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 4, 2022, pp. 3968–3976.
- A. Singhal, “Introducing the knowledge graph: things, not strings,” https://blog.google/products/search/introducing-knowledge-graph-things-not/, 2012.
- A. Kristiadi, M. A. Khan, D. Lukovnikov, J. Lehmann, and A. Fischer, “Incorporating literals into knowledge graph embeddings,” in The Semantic Web–ISWC 2019: 18th International Semantic Web Conference, Auckland, New Zealand, October 26–30, 2019, Proceedings, Part I 18. Springer, 2019, pp. 347–363.
- Y. Wu and Z. Wang, “Knowledge graph embedding with numeric attributes of entities,” in Proceedings of The Third Workshop on Representation Learning for NLP, 2018, pp. 132–136.
- E. P. George, D. Raabe, and R. O. Ritchie, “High-entropy alloys,” Nature reviews materials, vol. 4, no. 8, pp. 515–534, 2019.
- X. Liu, P. Xu, J. Zhao, W. Lu, M. Li, and G. Wang, “Material machine learning for alloys: Applications, challenges and perspectives,” Journal of Alloys and Compounds, vol. 921, p. 165984, 2022.
- L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big data of materials science: critical role of the descriptor,” Physical review letters, vol. 114, no. 10, p. 105503, 2015.
- J. Zhang, C. Cai, G. Kim, Y. Wang, and W. Chen, “Composition design of high-entropy alloys with deep sets learning,” npj Computational Materials, vol. 8, no. 1, p. 89, 2022.
- T. Ma, X. Lin, B. Song, S. Y. Philip, and X. Zeng, “Kg-mtl: Knowledge graph enhanced multi-task learning for molecular interaction,” IEEE Transactions on Knowledge and Data Engineering, 2022.
- D. Zhang, W. Feng, Y. Wang, Z. Qi, Y. Shan, and J. Tang, “Dropconn: Dropout connection based random gnns for molecular property prediction,” IEEE Transactions on Knowledge and Data Engineering, 2023.
- A. Y.-T. Wang, S. K. Kauwe, R. J. Murdock, and T. D. Sparks, “Compositionally restricted attention-based network for materials property predictions,” Npj Computational Materials, vol. 7, no. 1, p. 77, 2021.
- V. Fionda and G. Pirrò, “Learning triple embeddings from knowledge graphs,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 3874–3881.
- G. Song, D. Fu, and D. Zhang, “From knowledge graph development to serving industrial knowledge automation: A review,” in 2022 41st Chinese Control Conference (CCC). IEEE, 2022, pp. 4219–4226.
- R. Li, Y. Cao, Q. Zhu, G. Bi, F. Fang, Y. Liu, and Q. Li, “How does knowledge graph embedding extrapolate to unseen data: a semantic evidence view,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 5, 2022, pp. 5781–5791.
- A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” Advances in neural information processing systems, vol. 26, 2013.
- M. Zhu, B. Celikkaya, P. Bhatia, and C. K. Reddy, “Latte: Latent type modeling for biomedical entity linking,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 05, 2020, pp. 9757–9764.
- A. Cvetkov-Iliev, A. Allauzen, and G. Varoquaux, “Relational data embeddings for feature enrichment with background information,” Machine Learning, pp. 1–34, 2023.
- E. Bayram, A. García-Durán, and R. West, “Node attribute completion in knowledge graphs with multi-relational propagation,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 3590–3594.
- S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2009.
- Z. Qiu, K. Qiu, J. Fu, and D. Fu, “Weakly-supervised pre-training for 3d human pose estimation via perspective knowledge,” Pattern Recognition, vol. 139, p. 109497, 2023.
- V. Vangrunderbeek, L. B. Coelho, D. Zhang, Y. Li, Y. Van Ingelgem, and H. Terryn, “Exploring the potential of transfer learning in extrapolating accelerated corrosion test data for long-term atmospheric corrosion forecasting,” Corrosion Science, vol. 225, p. 111619, 2023.
- T. Li, X. Wang, and H. Zhong, “Cohesive clustering algorithm based on high-dimensional generalized fermat points,” Information Sciences, vol. 613, pp. 904–931, 2022.
- C. Yang, C. Ren, Y. Jia, G. Wang, M. Li, and W. Lu, “A machine learning-based alloy design system to facilitate the rational design of high entropy alloys with enhanced hardness,” Acta Materialia, vol. 222, p. 117431, 2022.
- C. Nyby, X. Guo, J. E. Saal, S.-C. Chien, A. Y. Gerard, H. Ke, T. Li, P. Lu, C. Oberdorfer, S. Sahu et al., “Electrochemical metrics for corrosion resistant alloys,” Scientific data, vol. 8, no. 1, p. 58, 2021.
- J. Xiong, S.-Q. Shi, and T.-Y. Zhang, “Machine learning of phases and mechanical properties in complex concentrated alloys,” Journal of Materials Science & Technology, vol. 87, pp. 133–142, 2021.
- OpenAI, “gpt-3.5 model,” https://platform.openai.com/docs/models/gpt-3-5, 2022.
- J. S. Delaney, “Esol: estimating aqueous solubility directly from molecular structure,” Journal of chemical information and computer sciences, vol. 44, no. 3, pp. 1000–1005, 2004.
- D. L. Mobley and J. P. Guthrie, “Freesolv: a database of experimental and calculated hydration free energies, with input files,” Journal of computer-aided molecular design, vol. 28, pp. 711–720, 2014.
- G. W. Bemis and M. A. Murcko, “The properties of known drugs. 1. molecular frameworks,” Journal of medicinal chemistry, vol. 39, no. 15, pp. 2887–2893, 1996.
- X. Fang, L. Liu, J. Lei, D. He, S. Zhang, J. Zhou, F. Wang, H. Wu, and H. Wang, “Geometry-enhanced molecular representation learning for property prediction,” Nature Machine Intelligence, vol. 4, no. 2, pp. 127–134, 2022.
- U. Bhandari, M. R. Rafi, C. Zhang, and S. Yang, “Yield strength prediction of high-entropy alloys using machine learning,” Materials Today Communications, vol. 26, p. 101871, 2021.
- C. Wen, Y. Zhang, C. Wang, D. Xue, Y. Bai, S. Antonov, L. Dai, T. Lookman, and Y. Su, “Machine learning assisted design of high entropy alloys with desired property,” Acta Materialia, vol. 170, pp. 109–117, 2019.
- M. Bakr, J. Syarif, and I. A. T. Hashem, “Prediction of phase and hardness of heas based on constituent elements using machine learning models,” Materials Today Communications, vol. 31, p. 103407, 2022.
- Y.-J. Chang, C.-Y. Jui, W.-J. Lee, and A.-C. Yeh, “Prediction of the composition and hardness of high-entropy alloys by machine learning,” Jom, vol. 71, pp. 3433–3442, 2019.
- P. Pezeshkpour, L. Chen, and S. Singh, “Embedding multimodal relational data for knowledge base completion,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 3208–3218.
- H. Luo, S. S. Sohn, W. Lu, L. Li, X. Li, C. K. Soundararajan, W. Krieger, Z. Li, and D. Raabe, “A strong and ductile medium-entropy alloy resists hydrogen embrittlement and corrosion,” Nature communications, vol. 11, no. 1, p. 3081, 2020.
- J. L. Durant, B. A. Leland, D. R. Henry, and J. G. Nourse, “Reoptimization of mdl keys for use in drug discovery,” Journal of chemical information and computer sciences, vol. 42, no. 6, pp. 1273–1280, 2002.
- D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” Advances in neural information processing systems, vol. 28, 2015.
- G. Landrum et al., “Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling,” Greg Landrum, vol. 8, p. 31, 2013.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations, 2016.
- K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in International Conference on Learning Representations, 2018.
- J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in International conference on machine learning. PMLR, 2017, pp. 1263–1272.
- K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea et al., “Are learned molecular representations ready for prime time?” arXiv preprint arXiv:1904.01561, 2019.
- Y. Song, S. Zheng, Z. Niu, Z.-H. Fu, Y. Lu, and Y. Yang, “Communicative representation learning on attributed molecular graphs.” in IJCAI, vol. 2020, 2020, pp. 2831–2838.
- Z. Xiong, D. Wang, X. Liu, F. Zhong, X. Wan, X. Li, Z. Li, X. Luo, K. Chen, H. Jiang et al., “Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism,” Journal of medicinal chemistry, vol. 63, no. 16, pp. 8749–8760, 2019.
- S. Liu, M. F. Demirel, and Y. Liang, “N-gram graph: Simple unsupervised representation for graphs, with applications to molecules,” Advances in neural information processing systems, vol. 32, 2019.
- W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec, “Strategies for pre-training graph neural networks,” arXiv preprint arXiv:1905.12265, 2019.
- Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, and J. Huang, “Self-supervised graph transformer on large-scale molecular data,” Advances in Neural Information Processing Systems, vol. 33, pp. 12 559–12 571, 2020.
- Y. Wang, J. Wang, Z. Cao, and A. Barati Farimani, “Molecular contrastive learning of representations via graph neural networks,” Nature Machine Intelligence, vol. 4, no. 3, pp. 279–287, 2022.
- G. Zhou, Z. Gao, Q. Ding, H. Zheng, H. Xu, Z. Wei, L. Zhang, and G. Ke, “Uni-mol: A universal 3d molecular representation learning framework,” in The Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=6K2RM6wVqKu
- P. Xu, X. Ji, M. Li, and W. Lu, “Small data machine learning in materials science,” npj Computational Materials, vol. 9, no. 1, p. 42, 2023.
- LangChain, “langchain,” https://www.langchain.com/, 2022.
- OpenAI, “Openai embeddings,” https://platform.openai.com/docs/models/embeddings, 2022.