Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instruct and Extract: Instruction Tuning for On-Demand Information Extraction (2310.16040v1)

Published 24 Oct 2023 in cs.CL and cs.AI

Abstract: LLMs with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE. Comprehensive evaluations on our benchmark reveal that ODIE substantially outperforms the existing open-source models of similar size. Our code and dataset are released on https://github.com/yzjiao/On-Demand-IE.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yizhu Jiao (22 papers)
  2. Ming Zhong (88 papers)
  3. Sha Li (42 papers)
  4. Ruining Zhao (8 papers)
  5. Siru Ouyang (22 papers)
  6. Heng Ji (266 papers)
  7. Jiawei Han (263 papers)
Citations (20)