Papers
Topics
Authors
Recent
2000 character limit reached

Generative Information Extraction

Updated 29 December 2025
  • Generative Information Extraction is defined as a paradigm that reformulates classic IE tasks into conditional text generation, unifying extraction of entities, relations, and events.
  • It employs advanced sequence-to-sequence models with schema-guided prompts and constrained decoding to directly generate structured outputs like JSON, ensuring consistency and flexibility.
  • Its applications span biomedical, financial, and multimodal domains, achieving state-of-the-art performance in precision, recall, and overall robustness across diverse extraction tasks.

Generative Information Extraction (GIE) is a paradigm in which information extraction tasks—traditionally framed as sequence labeling, classification, or pipeline-based approaches—are reformulated as conditional text generation problems. Leveraging large pre-trained LLMs, GIE systems learn to map input text to structured outputs through prompt-driven or schema-guided decoding, providing a unified, flexible, and often more robust alternative to discriminative or pipeline-based information extraction methods. The generative approach subsumes canonical subtasks such as named entity recognition, relation extraction, event extraction, and document structuring by directly generating structured outputs (often in JSON or template form) from raw or weakly labeled input, unifying the design and implementation across domains and tasks.

1. Foundations and Task Formalization

The generative information extraction paradigm emerges from reframing information extraction as a conditional text generation problem, where the output is a linearized or serialized form of the desired structured data. For an input XX (e.g., sentence, paragraph, or document) and a schema-aware prompt PP, a generative model parameterized by θ\theta is trained to maximize the likelihood of a structured output sequence YY:

Pθ(YX,P)=i=1mPθ(yiX,P,y<i)P_\theta(Y | X,P) = \prod_{i=1}^m P_\theta(y_i | X,P, y_{<i})

Training is by minimizing cross-entropy loss:

L(θ)=(X,Y)DlogPθ(YX,P)L(\theta) = -\sum_{(X,Y) \in D} \log P_\theta(Y | X,P)

Tasks are cast as outputs in natural language or code-like formats (e.g., bracketed strings, JSON schemas, class definitions), supporting canonical IE tasks:

  • Named Entity Recognition (NER): Generation of lists of entities/types (e.g., JSON objects containing text, start/end indices, and type).
  • Relation Extraction (RE): Generation of relation triples or tuples in textual or code-serialized formats.
  • Event Extraction (EE): Generation of events with triggers and arguments, often as nested JSON or template sequences.
  • Document and Multimodal IE: Template/key-value sequences or complex hierarchical forms.

This formulation applies equally to sentence-level, document-level, or multimodal inputs (including OCR outputs with spatial embeddings) (Xu et al., 2023, Hsu et al., 18 Nov 2024, Townsend et al., 2021, Ni et al., 2022, Cao et al., 2022, Cao et al., 2023, Josifoski et al., 2021).

2. Modeling Approaches and Architectures

GIE leverages advances in pre-trained sequence-to-sequence models (T5, BART, GPT, Llama, UniLM) and can be categorized as follows:

2.1 Generative Encoders/Decoders

  • Sequence-to-sequence Transformer architectures model conditional generation from input text (or multimodal input) to structured output.
  • Constrained Decoding: Constrained beam search or prefix-tokens force valid outputs (e.g., restricting to valid schema fields, entity names, or relation types) (Josifoski et al., 2021).
  • Pointer-generator mechanisms: In tasks involving extraction from images or OCR text, generative models are designed to assemble outputs by “pointing” into the source input based on prompt-aware matching weights (Yang et al., 21 Mar 2025).
  • Latent structure induction: Structural and syntactic biases (constituency, dependency) are incorporated in model post-training or decoding (Fei et al., 2023).

2.2 Prompt Engineering and Template Strategies

  • Schema-guided prompt templates specify task instructions, schema definitions, and expected output format (usually JSON or code-class) (Hsu et al., 18 Nov 2024, Xu et al., 2023).
  • Composable and hierarchical prompts enable modular and data-efficient generalization in data-scarce or zero-shot settings (Kan et al., 2022).
  • Multistage prompting: Complex extractions may proceed by multi-turn LLM coordination or two-stage generation (e.g., generate entities, then generate attributes/statuses) (Hu et al., 2023).

2.3 Training and Inference

2.4 Multimodal Input

3. Extraction Methods, Output Serialization, and Pipeline Design

Generative IE systems operationalize extraction via pipeline modules that can be mapped to four canonical components (Hsu et al., 18 Nov 2024):

Component Role Example Implementation
Engine LLM inference, uniform API to backend models InferenceEngine.chat for prompt+input
Extractor Frame and relation extraction (NER, EA, RE) FrameExtractor, RelationExtractor; prompt + postproc
Data type Schema-constrained result structures JSON, class, hierarchical templates
Prompt editor LLM-based schema/prompt design Interactive prompt REPL agent

Extraction proceeds via prompt construction, application to input units (sentences, paragraphs), LLM generation of serialized outputs, and rigorous post-processing (often including JSON validation, merging of overlapping records, error inspection).

Task Templates

  • NER: Generate entity JSONs with text, span, and type.
  • Entity Attribute Extraction: JSON including entity, attributes, span.
  • RE: Binary/multiclass, prompt-based pair classification, or direct triple outputs.
  • Event and Argument Extraction: Generation of trigger, type, and argument set.

Outputs are consolidated, validated (including overlapping spans and entity linking), and optionally visualized ((Hsu et al., 18 Nov 2024), viz_render, viz_serve).

4. Empirical Results and Benchmarks

Empirical performance is documented across diverse domains and tasks:

Biomedical and Clinical

  • LLM-IE demonstrates strong performance in biomedical NER, attribute extraction, and relation extraction, with F1 up to 0.78 on span-sensitive tasks and consistent gains for sentence-level prompting (Hsu et al., 18 Nov 2024).
  • GENIE (Llama-3.1-8B) processes entire paragraphs for EHR structuring, achieving phrase-level F1 = 0.837 and outperforming cTAKES/MetaMap by large relative margins (Ying et al., 30 Jan 2025).

Financial

  • Generative JSON-output models with PEFT fine-tuning on CCKS-2019 substantially outperform classification baselines (F1 = 0.945 vs. 0.934), especially in complex, high-entity instances (Choi et al., 20 Apr 2025).

Multimodal and Document Information Extraction

  • GMN and GenKIE (generative multimodal transformers) handle multimodal (text, layout, vision) DIE tasks, achieving F1 up to 98.2% (SROIE) and demonstrating robustness to OCR errors and layout permutation (Cao et al., 2022, Cao et al., 2023).
  • Generative Compositor delivers SOTA few-shot VIE (e.g., 62.2 F1 at 1-shot CORD; +22 points over baselines) using prompt-aware matching and pointer mechanisms (Yang et al., 21 Mar 2025).

Event and Relation Extraction

  • GREC and Contrastive Triple Extraction models achieve/exceed SOTA on ACE05, SemEval, NYT, WebNLG, and MIE, including low-resource and multitask settings (Ni et al., 2022, Ye et al., 2020).
  • GenIE (cIE with bi-level constraints) achieves micro F1 = 91.5 (small schema) and 68.9 (large schema), scaling gracefully to 5.9M entities/857 relations (Josifoski et al., 2021).
  • Template-based approaches with advanced copy mechanisms excel at document-level, cross-entity dependency extraction (Huang et al., 2021).

Unified and Universal IE

  • LasUIE and compositional prompt frameworks unify NER, RE, EE, and SRL into a single generative framework, with latent syntax induction/structural broadcast providing up to +2.8 F1 gains averaged over 12 benchmarks (Fei et al., 2023, Kan et al., 2022).

In-Context and Data-Efficient Learning

  • GPT-3 in-context generation, with retrieval-based prompt composition and logit biasing, can perform NER and RE at competitive F1 relative to fine-tuned BERT models, with the advantage of minimal annotation (100 training examples per task) (Choudhury et al., 31 Jul 2024).

5. Evaluation, Robustness, and Limitations

Evaluation protocols use micro- and macro-F1, entity/slot matching, and specialized schema-aware metrics. Several caveats and challenges are consistently reported:

  • Output misalignment: Generated sequences may violate schema constraints, necessitating robust postprocessing and validation (Xu et al., 2023).
  • Inference trade-offs: Sentence-level or fine-grained prompt granularity improves recall but increases LLM invocation and cost (Hsu et al., 18 Nov 2024, Hsu et al., 18 Nov 2024).
  • JSON reliability: Strict schema adherence in outputs is not guaranteed, and models may produce malformed or hallucinated records (Choi et al., 20 Apr 2025, Hsu et al., 18 Nov 2024).
  • Evaluation metric inadequacy: String-match F1 systematically undercredits models for valid alternate outputs or unannotated correct answers. Learned, entailment-augmented metrics such as SQC-Score better capture the true semantic accuracy of generative models, aligning more closely with human judgment (Fan et al., 4 Apr 2024).
  • Few-shot/zero-shot: Generative models exhibit stronger data efficiency and transfer, but still fall short of fine-tuned baselines in strict F1 at very low resources in some domains (Choudhury et al., 31 Jul 2024).
  • Complex structure and long documents: Limitations include input length ceilings (for BART/T5), entity ordering sensitivity, and noisy post-merging (Townsend et al., 2021, Huang et al., 2021).

6. Practical Systems and Software

Modern generative IE research enables practical, end-to-end systems deployable in production pipelines:

  • LLM-IE: Integrates LLM backends, prompt-editing agents, modular extractors, and web-based visualization, providing complete pipeline construction for biomedical and clinical IE (Hsu et al., 18 Nov 2024).
  • Doc2Dict: Demonstrates document-level IE as direct text-to-JSON generation up to 32,000 tokens using FiD chunking and gradient checkpointing (Townsend et al., 2021).
  • GENIE: Shows practical, on-premise, paragraph-wise structuring of clinical notes with open-access models, scaling to hundreds of paragraphs/hour on consumer GPUs (Ying et al., 30 Jan 2025).
  • Multimodal VIE systems: Enable robust extraction from complex, visually rich documents (e.g., receipts, business forms, scene images), matching or surpassing traditional layout-aware token classifiers (Cao et al., 2023, Yang et al., 21 Mar 2025).

7. Open Challenges and Research Directions

GIE research continues to address persistent challenges, with multiple fronts of active development:

  • Universal IE designs: Further integration of syntactic and semantic structure as inductive bias (e.g., latent adaptive structures, broadcast forests) to improve modeling of long-range dependencies and span boundaries (Fei et al., 2023).
  • Automated prompt optimization: Systematic prompt learning, schema adaptation, and chain-of-thought refinements to minimize manual engineering (Kan et al., 2022, Xu et al., 2023).
  • Evaluation advances: Development of robust, entailment-based, schema-aware, and humanfolded metrics for real-world model auditing (Fan et al., 4 Apr 2024).
  • Weak supervision and data augmentation: Use of LLM-generated annotations, inverse generation, and bidirectional GIE training for scalable IE in low-resource and open domains (Yang et al., 21 Mar 2025, Xu et al., 2023).
  • Multimodal and long-sequence architectures: Extending efficient attention mechanisms, pre-training schemas, and pointer-based decoding for full-document, table-rich, and visually structured data (Townsend et al., 2021, Cao et al., 2023, Yang et al., 21 Mar 2025).

Generative Information Extraction continues to bring uniformity, flexibility, and robustness to the extraction of structured information from complex, unstructured text and documents, offering a compelling path toward general-purpose, domain-adaptive NLP systems. As models, training objectives, and evaluation frameworks evolve, GIE offers a scalable, efficient, and unified alternative to legacy extraction paradigms across scientific, biomedical, financial, and general domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Generative Information Extraction.