Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Extract Structured Entities Using Language Models (2402.04437v5)

Published 6 Feb 2024 in cs.CL and cs.LG

Abstract: Recent advances in machine learning have significantly impacted the field of information extraction, with LLMs (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically represent information extraction as triplet-centric and use classical metrics such as precision and recall for evaluation. We reformulate the task to be entity-centric, enabling the use of diverse metrics that can provide more insights from various perspectives. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP (AESOP) metric, designed to appropriately assess model performance. Later, we introduce a new Multistage Structured Entity Extraction (MuSEE) model that harnesses the power of LMs for enhanced effectiveness and efficiency by decomposing the extraction task into multiple stages. Quantitative and human side-by-side evaluations confirm that our model outperforms baselines, offering promising directions for future advancements in structured entity extraction. Our source code and datasets are available at https://github.com/microsoft/Structured-Entity-Extraction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Haolun Wu (27 papers)
  2. Ye Yuan (274 papers)
  3. Liana Mikaelyan (4 papers)
  4. Alexander Meulemans (12 papers)
  5. Xue Liu (156 papers)
  6. James Hensman (46 papers)
  7. Bhaskar Mitra (78 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com