GPT Struct Me: Probing GPT Models on Narrative Entity Extraction (2311.14583v1)
Abstract: The importance of systems that can extract structured information from textual data becomes increasingly pronounced given the ever-increasing volume of text produced on a daily basis. Having a system that can effectively extract such information in an interoperable manner would be an asset for several domains, be it finance, health, or legal. Recent developments in natural language processing led to the production of powerful LLMs that can, to some degree, mimic human intelligence. Such effectiveness raises a pertinent question: Can these models be leveraged for the extraction of structured information? In this work, we address this question by evaluating the capabilities of two state-of-the-art LLMs -- GPT-3 and GPT-3.5, commonly known as ChatGPT -- in the extraction of narrative entities, namely events, participants, and temporal expressions. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles whose annotation framework includes a set of entity structures along with several tags and attribute values. We first select the best prompt template through an ablation study over prompt components that provide varying degrees of information on a subset of documents of the dataset. Subsequently, we use the best templates to evaluate the effectiveness of the models on the remaining documents. The results obtained indicate that GPT models are competitive with out-of-the-box baseline systems, presenting an all-in-one alternative for practitioners with limited resources. By studying the strengths and limitations of these models in the context of information extraction, we offer insights that can guide future improvements and avenues to explore in this field.
- B. Santana, R. Campos, E. Amorim, A. Jorge, P. Silvano, and S. Nunes, “A Survey on Narrative Extraction from Textual Data,” Artificial Intelligence Review, 1 2023. [Online]. Available: https://link.springer.com/10.1007/s10462-022-10338-7
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” in Advances in Neural Information Processing Systems, vol. abs/1706.03762. Long Beach, California, USA: Neural Information Processing Systems Foundation, Inc. (NeurIPS), 12 2017, pp. 5998–6008. [Online]. Available: http://arxiv.org/abs/1706.03762
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 4171–4186. [Online]. Available: http://aclweb.org/anthology/N19-1423
- B. Workshop and Others, “BLOOM: A 176B-Parameter Open-Access Multilingual Language Model,” arXiv, 11 2022. [Online]. Available: http://arxiv.org/abs/2211.05100
- T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot Learners,” arXiv, 5 2020. [Online]. Available: http://arxiv.org/abs/2005.14165
- P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–35, 9 2023.
- OpenAI, “Introducing ChatGPT.” [Online]. Available: https://openai.com/blog/chatgpt
- S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “ReAct: Synergizing Reasoning and Acting in Language Models,” arXiv, 10 2022.
- L. Zhang, M. Wang, L. Chen, and W. Zhang, “Probing GPT-3’s linguistic knowledge on semantic tasks,” in Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Abu Dhabi, United Arab Emirates (Hybrid): Association for Computational Linguistics, Dec. 2022, pp. 297–304. [Online]. Available: https://aclanthology.org/2022.blackboxnlp-1.24
- A. Hendy, M. Abdelrehim, A. Sharaf, V. Raunak, M. Gabr, H. Matsushita, Y. J. Kim, M. Afify, and H. H. Awadalla, “How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation,” arXiv, 2 2023.
- C. Qin, A. Zhang, Z. Zhang, J. Chen, M. Yasunaga, and D. Yang, “Is ChatGPT a General-Purpose Natural Language Processing Task Solver?” 2 2023. [Online]. Available: http://arxiv.org/abs/2302.06476
- A. Srivastava and Others, “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models,” arXiv, 6 2022.
- T. Goyal, J. J. Li, and G. Durrett, “News Summarization and Evaluation in the Era of GPT-3,” arXiv, 9 2022.
- R. Tang, X. Han, X. Jiang, and X. Hu, “Does Synthetic Data Generation of LLMs Help Clinical Text Mining?” arXiv, 3 2023.
- D. Stammbach, M. Antoniak, and E. Ash, “Heroes, Villains, and Victims, and GPT-3: Automated Extraction of Character Roles Without Training Data,” in Proceedings of the 4th Workshop of Narrative Understanding (WNU2022). Stroudsburg, PA, USA: Association for Computational Linguistics, 2022, pp. 47–56.
- X. Du and C. Cardie, “Event Extraction by Answering (Almost) Natural Questions,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 671–683.
- J. Liu, Y. Chen, K. Liu, W. Bi, and X. Liu, “Event Extraction as Machine Reading Comprehension,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 1641–1651.
- Linguistic Data Consortium, “ACE (Automatic Content Extraction) English Annotation Guidelines for Entities,” Linguistic Data Consortium, Tech. Rep., 2008. [Online]. Available: http://projects.ldc.upenn.edu/ace/
- J. Gao, H. Zhao, C. Yu, and R. Xu, “Exploring the Feasibility of ChatGPT for Event Extraction,” arXiv, 3 2023.
- A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel, “PaLM: Scaling Language Modeling with Pathways,” arXiv, 4 2022.
- J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. d. L. Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. v. d. Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, J. W. Rae, O. Vinyals, and L. Sifre, “Training Compute-Optimal Large Language Models,” arXiv, 3 2022.
- M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. H. Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. N. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. Knight, M. Brundage, M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, and W. Zaremba, “Evaluating Large Language Models Trained on Code,” arXiv, 7 2021.
- R. Taylor, M. Kardas, G. Cucurull, T. Scialom, A. Hartshorn, E. Saravia, A. Poulton, V. Kerkez, and R. Stojnic, “Galactica: A Large Language Model for Science,” arXiv, 11 2022.
- P. Silvano, A. Jorge, A. Leal, E. Amorim, H. Sousa, I. Cantante, R. Campos, and S. Nunes, “Text2Story Lusa Annotated ,” 2023. [Online]. Available: https://rdm.inesctec.pt/dataset/cs-2023-018
- P. Silvano, A. Leal, F. Silva, I. Cantante, F. Oliveira, and A. Jorge, “Developing a multilayer semantic annotation scheme based on ISO standards for the visualization of a newswire corpus,” in Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation, 2021, pp. 1–13. [Online]. Available: https://aclanthology.org/2021.isa-1.1.pdf
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” arXiv, 1 2022.
- S. Oliveira, D. Loureiro, and A. Jorge, “Improving Portuguese Semantic Role Labeling with Transformers and Transfer Learning,” in 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA). Porto, Portugal: IEEE, 10 2021, pp. 1–9. [Online]. Available: https://ieeexplore.ieee.org/document/9564238/
- A. d. S. B. Sacramento and M. Souza, “Joint event extraction with contextualized word embeddings for the portuguese language,” in Intelligent Systems, A. Britto and K. Valdivia Delgado, Eds. Cham: Springer International Publishing, 2021, pp. 496–510.
- J. Strötgen and M. Gertz, “Multilingual and Cross-Domain Temporal Tagging,” Language Resources and Evaluation, vol. 47, no. 2, pp. 269–298, 6 2013. [Online]. Available: http://link.springer.com/10.1007/s10579-012-9179-y
- H. Sousa, R. Campos, and A. Jorge, “TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023.
- OpenAI, “GPT-4 Technical Report,” arXiv, 3 2023. [Online]. Available: http://arxiv.org/abs/2303.08774