An Autonomous Large Language Model Agent for Chemical Literature Data Mining (2402.12993v1)
Abstract: Chemical synthesis, which is crucial for advancing material synthesis and drug discovery, impacts various sectors including environmental science and healthcare. The rise of technology in chemistry has generated extensive chemical data, challenging researchers to discern patterns and refine synthesis processes. AI helps by analyzing data to optimize synthesis and increase yields. However, AI faces challenges in processing literature data due to the unstructured format and diverse writing style of chemical literature. To overcome these difficulties, we introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature. This AI agent employs LLMs for prompt generation and iterative optimization. It functions as a chemistry assistant, automating data collection and analysis, thereby saving manpower and enhancing performance. Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data, and we compared our method with human experts in terms of content correctness and time efficiency. The proposed approach marks a significant advancement in automating chemical literature extraction and demonstrates the potential for AI to revolutionize data management and utilization in chemistry.
- K. Chen, G. Chen, J. Li, Y. Huang, E. Wang, T. Hou, and P.-A. Heng, “MetaRF: attention-based random forest for reaction yield prediction with a few trails,” Journal of Cheminformatics, vol. 15, no. 1, pp. 1–12, 2023.
- K. Chen, J. Li, K. Wang, Y. Du, J. Yu, J. Lu, G. Chen, L. Li, J. Qiu, Q. Fang et al., “Towards an automatic ai agent for reaction condition recommendation in chemical synthesis,” arXiv preprint arXiv:2311.10776, 2023.
- H. Cui, Y. Du, Q. Yang, Y. Shao, and S. C. Liew, “Llmind: Orchestrating ai and iot with llms for complex task execution,” arXiv preprint arXiv:2312.09007, 2023.
- Y. Du, S. C. Liew, K. Chen, and Y. Shao, “The power of large language models for wireless communication system development: A case study on fpga platforms,” arXiv preprint arXiv:2307.07319, 2023.
- J. Guo, A. S. Ibanez-Lopez, H. Gao, V. Quach, C. W. Coley, K. F. Jensen, and R. Barzilay, “Automated chemical reaction extraction from scientific literature,” Journal of chemical information and modeling, vol. 62, no. 9, pp. 2035–2045, 2021.
- D. T. Ahneman, J. G. Estrada, S. Lin, S. D. Dreher, and A. G. Doyle, “Predicting reaction performance in c–n cross-coupling using machine learning,” Science, vol. 360, no. 6385, pp. 186–190, 2018.
- D. Perera, J. W. Tucker, S. Brahmbhatt, C. J. Helal, A. Chong, W. Farrell, P. Richardson, and N. W. Sach, “A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow,” Science, vol. 359, no. 6374, pp. 429–434, 2018.
- J. Schleinitz, M. Langevin, Y. Smail, B. Wehnert, L. Grimaud, and R. Vuilleumier, “Machine learning yield prediction from nicolit, a small-size literature data set of nickel catalyzed c–o couplings,” Journal of the American Chemical Society, vol. 144, no. 32, pp. 14 722–14 730, 2022.
- M. C. Swain and J. M. Cole, “Chemdataextractor: a toolkit for automated extraction of chemical information from the scientific literature,” Journal of chemical information and modeling, vol. 56, no. 10, pp. 1894–1904, 2016.
- Z. Zheng, O. Zhang, C. Borgs, J. T. Chayes, and O. M. Yaghi, “Chatgpt chemistry assistant for text mining and the prediction of mof synthesis,” Journal of the American Chemical Society, vol. 145, no. 32, pp. 18 048–18 062, 2023, pMID: 37548379.