Agent-Event-Coder Paradigm

Updated 8 June 2026

The topic is defined as a modular framework using multi-agent coordination and executable code templates for zero-shot event extraction.
AEC divides the task into retrieval, planning, coding, and verification steps, ensuring schema alignment, iterative correction, and effective error diagnostics.
Empirical evaluations demonstrate significant improvements in trigger and argument extraction metrics compared to baselines, despite increased latency challenges.

The Agent-Event-Coder (AEC) paradigm is a multi-agent LLM framework for zero-shot event extraction that conceptualizes the extraction process as structured, iterative code generation. Leveraging principles from software engineering, AEC employs explicitly coordinated LLM agents to decompose event extraction into retrieval, planning, coding, and verification subtasks, with rigorous schema enforcement via executable code templates. This modular, collaborative approach addresses core challenges in zero-shot event extraction, notably incomplete outputs and schema violations, by integrating programmatic validation and systematic iterative refinement (Guo et al., 17 Nov 2025).

1. Framework Architecture

AEC employs four dedicated LLM agents, each responsible for a specialized subtask, coordinated via dual nested loops. The overall process ensures coverage, precision, and schema consistency:

Retrieval Agent ( $A_{ret}$ ): Receives an unseen event schema $S_e = \langle e, R_e \rangle$ (where $e$ is the event type and $R_e = \{(r_j, \tau_j)\}$ is the set of argument roles and value types) and outputs $k$ exemplar sentences $D_{ex} = \{s_1, ..., s_k\}$ illustrating event realizations in text.
Planning Agent ( $A_{plan}$ ): Accepts the input text $T = (w_1...w_n)$ , schema $S_e$ , and exemplars $D_{ex}$ , and generates a ranked list of $S_e = \langle e, R_e \rangle$ 0 trigger-type hypotheses $S_e = \langle e, R_e \rangle$ 1—where $S_e = \langle e, R_e \rangle$ 2 is a trigger span, $S_e = \langle e, R_e \rangle$ 3 is a confidence score, and $S_e = \langle e, R_e \rangle$ 4 is a rationale.
Coding Agent ( $S_e = \langle e, R_e \rangle$ 5): For the top hypothesis $S_e = \langle e, R_e \rangle$ 6, instantiates a Python (Pydantic) BaseModel class corresponding to $S_e = \langle e, R_e \rangle$ 7 and emits code to populate event arguments.
Verification Agent ( $S_e = \langle e, R_e \rangle$ 8): Receives the emitted code object $S_e = \langle e, R_e \rangle$ 9 and applies deterministic semantic, type, and structural tests, returning $e$ 0 where $e$ 1 is pass/fail and $e$ 2 is diagnostic feedback.

Orchestration proceeds with an outer loop over trigger hypotheses and an inner loop over code-refinement attempts, using verification feedback to patch and resubmit code until a valid event instance is produced or hypotheses are exhausted.

2. Schema Representation and Code Templates

Each event schema

$e$ 3

is rendered into an executable Python/Pydantic BaseModel class template. For example, the schema

$e$ 4

yields:

$k$ 7

This executable approach ensures that any instantiation with missing or ill-typed arguments triggers runtime or Pydantic validation failures, enabling deterministic validation of output structure and type compliance.

3. Iterative Extraction Pipeline

AEC’s code-generation workflow is structured as follows (adapting Algorithm 1 from (Guo et al., 17 Nov 2025)):

Exemplar Retrieval: $e$ 5
Hypothesis Planning: $e$ 6
Hypothesis Processing:
- For each hypothesis in descending confidence order:
  - For up to $e$ 7 refinement attempts:
  - $e$ 8
  - $e$ 9
  - If $R_e = \{(r_j, \tau_j)\}$ 0 is true, deserialize $R_e = \{(r_j, \tau_j)\}$ 1 and output event $R_e = \{(r_j, \tau_j)\}$ 2
  - Else, patch code in $R_e = \{(r_j, \tau_j)\}$ 3 according to $R_e = \{(r_j, \tau_j)\}$ 4

Failures in validation yield compiler-style diagnostics (e.g., missing arguments, type mismatches, syntactic errors), enabling the coding agent to reprompt the LLM for incremental correction until the event extraction passes all checks.

Verification in AEC is performed in three sequential, deterministic stages:

Semantic Check ( $R_e = \{(r_j, \tau_j)\}$ 5): Verifies that the candidate trigger $R_e = \{(r_j, \tau_j)\}$ 6 actually appears in input $R_e = \{(r_j, \tau_j)\}$ 7 and is contextually compatible with event type $R_e = \{(r_j, \tau_j)\}$ 8, typically via string match and embedding similarity.
Type Check ( $R_e = \{(r_j, \tau_j)\}$ 9): Applies Pydantic validation to confirm each argument $k$ 0 matches schema type $k$ 1, and enforces multiplicity (list cardinality, requiredness).
Structural Check ( $k$ 2): Ensures that the generated code compiles, the field inventory matches the schema, and serialization (e.g., to JSON) is possible.

The formal validation criterion is:

$k$ 3

Failures generate diagnostic $k$ 4 messages (e.g., "Argument ‘target’ not provided," "Expected List[str] for ‘tool’ but got 'cash' as str," "SyntaxError on line 3"), which the coder agent uses for iterative code correction.

5. Empirical Evaluation

Evaluation on five benchmarks (FewEvent, ACE2005, GENIA, SPEED, CASIE) and six LLM architectures shows consistent improvements over zero-shot baselines across entity and argument extraction metrics. Results for selected models:

Model	FewEvent TI / TC	ACE2005 TI / TC
Llama3-8B	27.0 / 27.6	40.5 / 48.8
Best baseline	25.2 / 24.8	36.1 / 44.2
Llama3-70B	42.1 / 40.5	57.0 / 54.6
Next best	35.3 / 40.7	51.5 / 51.4

Other LLMs (Qwen2.5-72B, GPT-3.5-turbo, GPT-4o) exhibit analogous improvements, with absolute gains of +3–6% in trigger identification/classification (TI/TC) and +2–4% in argument identification/classification (AI/AC). Ablation studies show that eliminating any agent—especially Retrieval or Verification—causes large performance drops (up to 10 TI points). Performance plateaus for $k$ 5 hypotheses and $k$ 6 refinement attempts.

6. Principles, Limitations, and Extensions

AEC frames event extraction as code generation and validation, using schema-as-code and deterministic tests to minimize hallucinations and schema violations that affect direct prompting. Explicit modular decomposition into Retrieval, Planning, Coding, and Verification improves stepwise interpretability and allows targeted diagnosis of error points. The iterative refinement loop with compiler-like diagnostics yields robust zero-shot extraction, even absent in-domain training data.

Noted limitations include increased latency due to multiple agent invocations and code compilation, and scalability challenges with highly complex or deeply nested schemas. Proposed extensions involve augmenting the Retrieval Agent with external knowledge bases for exemplar generation, developing a Postmortem Agent to address persistent extraction failures, and leveraging verified event objects as pseudo-labeled data for few-shot or semi-supervised learning scenarios.

The AEC paradigm operationalizes a software-engineering perspective for zero-shot event extraction, tightly integrating LLM reasoning, modular workflow decomposition, and runtime schema validation for precise and schema-compliant outputs (Guo et al., 17 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Agent-Event-Coder (AEC) Paradigm.

Agent-Event-Coder Paradigm

1. Framework Architecture

2. Schema Representation and Code Templates

3. Iterative Extraction Pipeline

4. Deterministic Verification and Iterative Refinement

5. Empirical Evaluation

6. Principles, Limitations, and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Agent-Event-Coder Paradigm

1. Framework Architecture

2. Schema Representation and Code Templates

3. Iterative Extraction Pipeline

4. Deterministic Verification and Iterative Refinement

5. Empirical Evaluation

6. Principles, Limitations, and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research