Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Autonomous Large Language Model Agent for Chemical Literature Data Mining (2402.12993v1)

Published 20 Feb 2024 in cs.IR, cs.AI, cs.LG, and q-bio.QM

Abstract: Chemical synthesis, which is crucial for advancing material synthesis and drug discovery, impacts various sectors including environmental science and healthcare. The rise of technology in chemistry has generated extensive chemical data, challenging researchers to discern patterns and refine synthesis processes. AI helps by analyzing data to optimize synthesis and increase yields. However, AI faces challenges in processing literature data due to the unstructured format and diverse writing style of chemical literature. To overcome these difficulties, we introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature. This AI agent employs LLMs for prompt generation and iterative optimization. It functions as a chemistry assistant, automating data collection and analysis, thereby saving manpower and enhancing performance. Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data, and we compared our method with human experts in terms of content correctness and time efficiency. The proposed approach marks a significant advancement in automating chemical literature extraction and demonstrates the potential for AI to revolutionize data management and utilization in chemistry.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. K. Chen, G. Chen, J. Li, Y. Huang, E. Wang, T. Hou, and P.-A. Heng, “MetaRF: attention-based random forest for reaction yield prediction with a few trails,” Journal of Cheminformatics, vol. 15, no. 1, pp. 1–12, 2023.
  2. K. Chen, J. Li, K. Wang, Y. Du, J. Yu, J. Lu, G. Chen, L. Li, J. Qiu, Q. Fang et al., “Towards an automatic ai agent for reaction condition recommendation in chemical synthesis,” arXiv preprint arXiv:2311.10776, 2023.
  3. H. Cui, Y. Du, Q. Yang, Y. Shao, and S. C. Liew, “Llmind: Orchestrating ai and iot with llms for complex task execution,” arXiv preprint arXiv:2312.09007, 2023.
  4. Y. Du, S. C. Liew, K. Chen, and Y. Shao, “The power of large language models for wireless communication system development: A case study on fpga platforms,” arXiv preprint arXiv:2307.07319, 2023.
  5. J. Guo, A. S. Ibanez-Lopez, H. Gao, V. Quach, C. W. Coley, K. F. Jensen, and R. Barzilay, “Automated chemical reaction extraction from scientific literature,” Journal of chemical information and modeling, vol. 62, no. 9, pp. 2035–2045, 2021.
  6. D. T. Ahneman, J. G. Estrada, S. Lin, S. D. Dreher, and A. G. Doyle, “Predicting reaction performance in c–n cross-coupling using machine learning,” Science, vol. 360, no. 6385, pp. 186–190, 2018.
  7. D. Perera, J. W. Tucker, S. Brahmbhatt, C. J. Helal, A. Chong, W. Farrell, P. Richardson, and N. W. Sach, “A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow,” Science, vol. 359, no. 6374, pp. 429–434, 2018.
  8. J. Schleinitz, M. Langevin, Y. Smail, B. Wehnert, L. Grimaud, and R. Vuilleumier, “Machine learning yield prediction from nicolit, a small-size literature data set of nickel catalyzed c–o couplings,” Journal of the American Chemical Society, vol. 144, no. 32, pp. 14 722–14 730, 2022.
  9. M. C. Swain and J. M. Cole, “Chemdataextractor: a toolkit for automated extraction of chemical information from the scientific literature,” Journal of chemical information and modeling, vol. 56, no. 10, pp. 1894–1904, 2016.
  10. Z. Zheng, O. Zhang, C. Borgs, J. T. Chayes, and O. M. Yaghi, “Chatgpt chemistry assistant for text mining and the prediction of mof synthesis,” Journal of the American Chemical Society, vol. 145, no. 32, pp. 18 048–18 062, 2023, pMID: 37548379.
Citations (4)

Summary

We haven't generated a summary for this paper yet.