Assessing the quality of information extraction (2404.04068v2)
Abstract: Advances in LLMs have notably enhanced the efficiency of information extraction from unstructured and semi-structured data sources. As these technologies become integral to various applications, establishing an objective measure for the quality of information extraction becomes imperative. However, the scarcity of labeled data presents significant challenges to this endeavor. In this paper, we introduce an automatic framework to assess the quality of the information extraction/retrieval and its completeness. The framework focuses on information extraction in the form of entity and its properties. We discuss how to handle the input/output size limitations of the LLMs and analyze their performance when extracting the information. In particular, we introduce scores to evaluate the quality of the extraction and provide an extensive discussion on how to interpret them.
- METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005.
- Matthew Edgar. Schema and structured data markup. In Tech SEO Guide: A Reference Guide for Developers and Marketers Involved in Technical SEO, pages 67–78. Springer, 2023.
- Who Needs External References?—Text Summarization Evaluation Using Original Documents. AI, 4(4):970–995, 2023.
- Evaluating ChatGPT in Information Extraction: A Case Study of Extracting Cognitive Exam Dates and Scores. 2023.
- In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss. arXiv preprint arXiv:2402.10790v2, 2024.
- Making large language models better data creators. arXiv preprint arXiv:2310.20111, 2023.
- Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023.
- Large language models for generative information extraction: A survey. arXiv preprint arXiv:2312.17617, 2023.
- Filip Seitl (2 papers)
- Tomáš Kovářík (1 paper)
- Soheyla Mirshahi (1 paper)
- Jan Kryštůfek (1 paper)
- Rastislav Dujava (1 paper)
- Matúš Ondreička (1 paper)
- Herbert Ullrich (5 papers)
- Petr Gronat (2 papers)