Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 441 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Information Extraction from Historical Well Records Using A Large Language Model (2405.05438v1)

Published 8 May 2024 in cs.IR

Abstract: To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Although some historical documents are available, they are often unstructured, not cleaned, and outdated. Additionally, they vary widely by state and type. Manual reading and digitizing this information from historical documents are not feasible, given the high number of wells. Here, we propose a new computational approach for rapidly and cost-effectively locating these wells. Specifically, we leverage the advanced capabilities of LLMs to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test them on a dataset of 160 well documents. Our results show that the developed workflow achieves excellent accuracy in extracting location and depth from clean, PDF-based reports, with a 100% accuracy rate. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, including reduced labor and increased automation. In general, more detailed prompting leads to improved information extraction, and those LLMs with more parameters typically perform better. We provided a detailed discussion of the current challenges and the corresponding opportunities/approaches to address them. Additionally, a vast amount of geoscientific information is locked up in old documents, and this work demonstrates that recent breakthroughs in LLMs enable us to unlock this information more broadly.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. J. Boutot, A. S. Peltz, R. McVay, and M. Kang, “Documented orphaned oil and gas wells across the United States,” Environmental Science & Technology, vol. 56, no. 20, pp. 14228–14236, 2022.
  2. D. O’Malley, A. Delorey, E. Guiltinan, Z. Ma, T. Kadeethum, G. Lackey, J. Lee, E. Emily Follansbee, M. Nair, N. Pekney, M. Mehana, P. Hora, J. W. Carey, C. Varadharajan, F. Ciulla, S. Biraud, P. Jordan, M. Dubey, Y. Wu, I. Jahan, M. Dubey, C. Weiss, J. Boutot, M. Kang, A. Govert, and H. Viswanathan, “The undocumented orphan well challenge: An interdisciplinary opportunity to achieve sustainability,” Environmental Science & Technology, (Under Review).
  3. M. D. Merrill, C. A. Grove, N. J. Gianoutsos, and P. A. Freeman, “Analysis of the United States documented unplugged orphaned oil and gas well dataset,” Technical Report from US Geological Survey, 2023.
  4. IOGCC, “Idle and orphan oil and gas wells: State and provincial regulatory strategies 2021,” Technical Report from Interstate Oil and Gas Compact Commission (IOGCC), 2021.
  5. U. S. E. P. Agency, “Inventory of U.S. greenhouse gas emissions and sinks: 1990-2020,” Technical Report from United States Environmental Protection Agency (EPA), 2022.
  6. M. Kang, J. Boutot, R. C. McVay, K. A. Roberts, S. Jasechko, D. Perrone, T. Wen, G. Lackey, D. Raimi, D. C. Digiulio, S. B. C. Shonkoff, J. William Carey, E. G. Elliott, D. J. Vorhees, and A. S. Peltz, “Environmental risks and opportunities of orphaned oil and gas wells in the United States,” Environmental Research Letters, vol. 18, p. 074012, July 2023.
  7. D. Raimi, A. J. Krupnick, J.-S. Shah, and A. Thompson, “Decommissioning orphaned and abandoned oil and gas wells: New estimates and cost drivers,” Environmental Science & Technology, vol. 55, no. 15, pp. 10224–10230, 2021.
  8. L. Eikvil, “Optical character recognition,” citeseer. ist. psu. edu/142042. html, vol. 26, 1993.
  9. Springer, 2017.
  10. Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang, et al., “A survey on evaluation of large language models,” ACM Transactions on Intelligent Systems and Technology, 2023.
  11. O. Topsakal and T. C. Akinci, “Creating large language model applications utilizing langchain: A primer on developing llm apps fast,” International Conference on Applied Engineering and Natural Sciences, vol. 1, no. 1, pp. 1050–1056, 2023.
  12. Z. Ma, Y. D. Kim, O. Volkov, and L. J. Durlofsky, “Optimization of subsurface flow operations using a dynamic proxy strategy,” Mathematical Geosciences, vol. 54, no. 8, pp. 1261–1287, 2022.
  13. Z. Ma and J. Y. Leung, “Design of warm solvent injection processes for heterogeneous heavy oil reservoirs: A hybrid workflow of multi-objective optimization and proxy models,” Journal of Petroleum Science and Engineering, vol. 191, p. 107186, 2020.
  14. J. E. Santos, Z. R. Fox, A. Mohan, D. O’Malley, H. Viswanathan, and N. Lubbers, “Development of the senseiver for efficient field reconstruction from sparse observations,” Nature Machine Intelligence, vol. 5, no. 11, pp. 1317–1325, 2023.
  15. B. Zhang, Z. Ma, D. Zheng, R. J. Chalaturnyk, and J. Boisvert, “Upscaling shear strength of heterogeneous oil sands with interbedded shales using artificial neural network,” SPE Journal, vol. 28, no. 02, pp. 737–753, 2023.
  16. B. Yan, D. R. Harp, B. Chen, and R. J. Pawar, “Improving deep learning performance for predicting large-scale geological co 2 sequestration modeling through feature coarsening,” Scientific Reports, vol. 12, no. 1, p. 20667, 2022.
  17. Z. Ma, B. Chen, and R. J. Pawar, “Development of a machine learning-based proxy model for geologic CO2 storage operation – a field application,” in AGU Annual Meeting 2023, San Fransico, USA, 2023.
  18. S. Srinivasan, D. O’Malley, M. K. Mudunuru, M. R. Sweeney, J. D. Hyman, S. Karra, L. Frash, J. W. Carey, M. R. Gross, G. D. Guthrie, et al., “A machine learning framework for rapid forecasting and history matching in unconventional reservoirs,” Scientific Reports, vol. 11, no. 1, p. 21730, 2021.
  19. K. Gao and R. T. Modrak, “Machine learning inference of random medium properties,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–13, 2024.
  20. S. Minaee, T. Mikolov, N. Nikzad, M. Chenaghlu, R. Socher, X. Amatriain, and J. Gao, “Large language models: A survey,” arXiv preprint arXiv:2402.06196, 2024.
  21. S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, and X. Wu, “Unifying large language models and knowledge graphs: A roadmap,” IEEE Transactions on Knowledge and Data Engineering, 2024.
  22. R. Koshkin, K. Sudoh, and S. Nakamura, “Transllama: Llm-based simultaneous translation system,” arXiv:2402.04636, 2024.
  23. X. Sun, X. Li, S. Zhang, S. Wang, F. Wu, J. Li, T. Zhang, and G. Wang, “Sentiment analysis through llm negotiations,” arXiv:2311.01876, 2023.
  24. Y. Zhuang, Y. Yu, K. Wang, H. Sun, and C. Zhang, “Toolqa: A dataset for llm question answering with external tools,” in Advances in Neural Information Processing Systems (A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, eds.), vol. 36, pp. 50117–50143, Curran Associates, Inc., 2023.
  25. Y. Wang, W. Wang, S. Joty, and S. C. H. Hoi, “Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” arXiv:2109.00859, 2021.
  26. S. Shekhar, T. Dubey, K. Mukherjee, A. Saxena, A. Tyagi, and N. Kotla, “Towards optimizing the costs of llm usage,” arXiv:2402.01742, 2024.
  27. T. F. Tan, K. Elangovan, L. Jin, Y. Jie, L. Yong, J. Lim, S. Poh, W. Y. Ng, D. Lim, Y. Ke, N. Liu, and D. S. W. Ting, “Fine-tuning large language model (llm) artificial intelligence chatbots in ophthalmology and llm-based evaluation using GPT-4,” arXiv:2402.10083, 2024.
  28. E. Foroumandi, H. Moradkhani, X. Sanchez-Vila, K. Singha, A. Castelletti, and G. Destouni, “ChatGPT in hydrology and earth sciences: Opportunities, prospects, and concerns,” Water Resources Research, vol. 59, no. 10, p. e2023WR036288, 2023.
  29. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” arXiv:2302.13971, 2023.
  30. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
  31. J. Huang, S. S. Gu, L. Hou, Y. Wu, X. Wang, H. Yu, and J. Han, “Large language models can self-improve,” arXiv preprint arXiv:2210.11610, 2022.
  32. J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou, et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022.
  33. pdftotext, “pdftotext - portable document format (pdf) to text converter (version 3.00),” software available at https://linux.die.net/man/1/pdftotext, 2024.
  34. Google, “Google Document AI,” online tool available https://cloud.google.com/document-ai?hl=en, 2024.
  35. A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving language understanding by generative pre-training,” OpenAI, 2018.
  36. TheBloke, “Llama 2 70b chat - gptq,” 2024.
  37. T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al., “Huggingface’s transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.
  38. E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “GPTQ: Accurate post-training quantization for generative pre-trained transformers,” arXiv preprint arXiv:2210.17323, 2022.
  39. C. Karney, “Geographiclib,” online at https://geographiclib.sourceforge.io/oindex.html, 2015.
  40. V. Liu and L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models,” in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pp. 1–23, 2022.
  41. L. Reynolds and K. McDonell, “Prompt programming for large language models: Beyond the few-shot paradigm,” in Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–7, 2021.
  42. A. Birhane, A. Kasirzadeh, D. Leslie, and S. Wachter, “Science in the age of large language models,” Nature Reviews Physics, pp. 1–4, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: