Papers
Topics
Authors
Recent
Search
2000 character limit reached

Agent-Event-Coder Paradigm

Updated 8 June 2026
  • The topic is defined as a modular framework using multi-agent coordination and executable code templates for zero-shot event extraction.
  • AEC divides the task into retrieval, planning, coding, and verification steps, ensuring schema alignment, iterative correction, and effective error diagnostics.
  • Empirical evaluations demonstrate significant improvements in trigger and argument extraction metrics compared to baselines, despite increased latency challenges.

The Agent-Event-Coder (AEC) paradigm is a multi-agent LLM framework for zero-shot event extraction that conceptualizes the extraction process as structured, iterative code generation. Leveraging principles from software engineering, AEC employs explicitly coordinated LLM agents to decompose event extraction into retrieval, planning, coding, and verification subtasks, with rigorous schema enforcement via executable code templates. This modular, collaborative approach addresses core challenges in zero-shot event extraction, notably incomplete outputs and schema violations, by integrating programmatic validation and systematic iterative refinement (Guo et al., 17 Nov 2025).

1. Framework Architecture

AEC employs four dedicated LLM agents, each responsible for a specialized subtask, coordinated via dual nested loops. The overall process ensures coverage, precision, and schema consistency:

  • Retrieval Agent (AretA_{ret}): Receives an unseen event schema Se=⟨e,Re⟩S_e = \langle e, R_e \rangle (where ee is the event type and Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\} is the set of argument roles and value types) and outputs kk exemplar sentences Dex={s1,...,sk}D_{ex} = \{s_1, ..., s_k\} illustrating event realizations in text.
  • Planning Agent (AplanA_{plan}): Accepts the input text T=(w1...wn)T = (w_1...w_n), schema SeS_e, and exemplars DexD_{ex}, and generates a ranked list of Se=⟨e,Re⟩S_e = \langle e, R_e \rangle0 trigger-type hypotheses Se=⟨e,Re⟩S_e = \langle e, R_e \rangle1—where Se=⟨e,Re⟩S_e = \langle e, R_e \rangle2 is a trigger span, Se=⟨e,Re⟩S_e = \langle e, R_e \rangle3 is a confidence score, and Se=⟨e,Re⟩S_e = \langle e, R_e \rangle4 is a rationale.
  • Coding Agent (Se=⟨e,Re⟩S_e = \langle e, R_e \rangle5): For the top hypothesis Se=⟨e,Re⟩S_e = \langle e, R_e \rangle6, instantiates a Python (Pydantic) BaseModel class corresponding to Se=⟨e,Re⟩S_e = \langle e, R_e \rangle7 and emits code to populate event arguments.
  • Verification Agent (Se=⟨e,Re⟩S_e = \langle e, R_e \rangle8): Receives the emitted code object Se=⟨e,Re⟩S_e = \langle e, R_e \rangle9 and applies deterministic semantic, type, and structural tests, returning ee0 where ee1 is pass/fail and ee2 is diagnostic feedback.

Orchestration proceeds with an outer loop over trigger hypotheses and an inner loop over code-refinement attempts, using verification feedback to patch and resubmit code until a valid event instance is produced or hypotheses are exhausted.

2. Schema Representation and Code Templates

Each event schema

ee3

is rendered into an executable Python/Pydantic BaseModel class template. For example, the schema

ee4

yields:

kk7

This executable approach ensures that any instantiation with missing or ill-typed arguments triggers runtime or Pydantic validation failures, enabling deterministic validation of output structure and type compliance.

3. Iterative Extraction Pipeline

AEC’s code-generation workflow is structured as follows (adapting Algorithm 1 from (Guo et al., 17 Nov 2025)):

  1. Exemplar Retrieval: ee5
  2. Hypothesis Planning: ee6
  3. Hypothesis Processing:
    • For each hypothesis in descending confidence order:
      • For up to ee7 refinement attempts:
      • ee8
      • ee9
      • If Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}0 is true, deserialize Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}1 and output event Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}2
      • Else, patch code in Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}3 according to Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}4

Failures in validation yield compiler-style diagnostics (e.g., missing arguments, type mismatches, syntactic errors), enabling the coding agent to reprompt the LLM for incremental correction until the event extraction passes all checks.

4. Deterministic Verification and Iterative Refinement

Verification in AEC is performed in three sequential, deterministic stages:

  • Semantic Check (Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}5): Verifies that the candidate trigger Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}6 actually appears in input Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}7 and is contextually compatible with event type Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}8, typically via string match and embedding similarity.
  • Type Check (Re={(rj,Ï„j)}R_e = \{(r_j, \tau_j)\}9): Applies Pydantic validation to confirm each argument kk0 matches schema type kk1, and enforces multiplicity (list cardinality, requiredness).
  • Structural Check (kk2): Ensures that the generated code compiles, the field inventory matches the schema, and serialization (e.g., to JSON) is possible.

The formal validation criterion is:

kk3

Failures generate diagnostic kk4 messages (e.g., "Argument ‘target’ not provided," "Expected List[str] for ‘tool’ but got 'cash' as str," "SyntaxError on line 3"), which the coder agent uses for iterative code correction.

5. Empirical Evaluation

Evaluation on five benchmarks (FewEvent, ACE2005, GENIA, SPEED, CASIE) and six LLM architectures shows consistent improvements over zero-shot baselines across entity and argument extraction metrics. Results for selected models:

Model FewEvent TI / TC ACE2005 TI / TC
Llama3-8B 27.0 / 27.6 40.5 / 48.8
Best baseline 25.2 / 24.8 36.1 / 44.2
Llama3-70B 42.1 / 40.5 57.0 / 54.6
Next best 35.3 / 40.7 51.5 / 51.4

Other LLMs (Qwen2.5-72B, GPT-3.5-turbo, GPT-4o) exhibit analogous improvements, with absolute gains of +3–6% in trigger identification/classification (TI/TC) and +2–4% in argument identification/classification (AI/AC). Ablation studies show that eliminating any agent—especially Retrieval or Verification—causes large performance drops (up to 10 TI points). Performance plateaus for kk5 hypotheses and kk6 refinement attempts.

6. Principles, Limitations, and Extensions

AEC frames event extraction as code generation and validation, using schema-as-code and deterministic tests to minimize hallucinations and schema violations that affect direct prompting. Explicit modular decomposition into Retrieval, Planning, Coding, and Verification improves stepwise interpretability and allows targeted diagnosis of error points. The iterative refinement loop with compiler-like diagnostics yields robust zero-shot extraction, even absent in-domain training data.

Noted limitations include increased latency due to multiple agent invocations and code compilation, and scalability challenges with highly complex or deeply nested schemas. Proposed extensions involve augmenting the Retrieval Agent with external knowledge bases for exemplar generation, developing a Postmortem Agent to address persistent extraction failures, and leveraging verified event objects as pseudo-labeled data for few-shot or semi-supervised learning scenarios.

The AEC paradigm operationalizes a software-engineering perspective for zero-shot event extraction, tightly integrating LLM reasoning, modular workflow decomposition, and runtime schema validation for precise and schema-compliant outputs (Guo et al., 17 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Agent-Event-Coder (AEC) Paradigm.