Agent-Event-Coder Paradigm
- The topic is defined as a modular framework using multi-agent coordination and executable code templates for zero-shot event extraction.
- AEC divides the task into retrieval, planning, coding, and verification steps, ensuring schema alignment, iterative correction, and effective error diagnostics.
- Empirical evaluations demonstrate significant improvements in trigger and argument extraction metrics compared to baselines, despite increased latency challenges.
The Agent-Event-Coder (AEC) paradigm is a multi-agent LLM framework for zero-shot event extraction that conceptualizes the extraction process as structured, iterative code generation. Leveraging principles from software engineering, AEC employs explicitly coordinated LLM agents to decompose event extraction into retrieval, planning, coding, and verification subtasks, with rigorous schema enforcement via executable code templates. This modular, collaborative approach addresses core challenges in zero-shot event extraction, notably incomplete outputs and schema violations, by integrating programmatic validation and systematic iterative refinement (Guo et al., 17 Nov 2025).
1. Framework Architecture
AEC employs four dedicated LLM agents, each responsible for a specialized subtask, coordinated via dual nested loops. The overall process ensures coverage, precision, and schema consistency:
- Retrieval Agent (): Receives an unseen event schema (where is the event type and is the set of argument roles and value types) and outputs exemplar sentences illustrating event realizations in text.
- Planning Agent (): Accepts the input text , schema , and exemplars , and generates a ranked list of 0 trigger-type hypotheses 1—where 2 is a trigger span, 3 is a confidence score, and 4 is a rationale.
- Coding Agent (5): For the top hypothesis 6, instantiates a Python (Pydantic) BaseModel class corresponding to 7 and emits code to populate event arguments.
- Verification Agent (8): Receives the emitted code object 9 and applies deterministic semantic, type, and structural tests, returning 0 where 1 is pass/fail and 2 is diagnostic feedback.
Orchestration proceeds with an outer loop over trigger hypotheses and an inner loop over code-refinement attempts, using verification feedback to patch and resubmit code until a valid event instance is produced or hypotheses are exhausted.
2. Schema Representation and Code Templates
Each event schema
3
is rendered into an executable Python/Pydantic BaseModel class template. For example, the schema
4
yields:
7
This executable approach ensures that any instantiation with missing or ill-typed arguments triggers runtime or Pydantic validation failures, enabling deterministic validation of output structure and type compliance.
3. Iterative Extraction Pipeline
AEC’s code-generation workflow is structured as follows (adapting Algorithm 1 from (Guo et al., 17 Nov 2025)):
- Exemplar Retrieval: 5
- Hypothesis Planning: 6
- Hypothesis Processing:
- For each hypothesis in descending confidence order:
- For up to 7 refinement attempts:
- 8
- 9
- If 0 is true, deserialize 1 and output event 2
- Else, patch code in 3 according to 4
- For each hypothesis in descending confidence order:
Failures in validation yield compiler-style diagnostics (e.g., missing arguments, type mismatches, syntactic errors), enabling the coding agent to reprompt the LLM for incremental correction until the event extraction passes all checks.
4. Deterministic Verification and Iterative Refinement
Verification in AEC is performed in three sequential, deterministic stages:
- Semantic Check (5): Verifies that the candidate trigger 6 actually appears in input 7 and is contextually compatible with event type 8, typically via string match and embedding similarity.
- Type Check (9): Applies Pydantic validation to confirm each argument 0 matches schema type 1, and enforces multiplicity (list cardinality, requiredness).
- Structural Check (2): Ensures that the generated code compiles, the field inventory matches the schema, and serialization (e.g., to JSON) is possible.
The formal validation criterion is:
3
Failures generate diagnostic 4 messages (e.g., "Argument ‘target’ not provided," "Expected List[str] for ‘tool’ but got 'cash' as str," "SyntaxError on line 3"), which the coder agent uses for iterative code correction.
5. Empirical Evaluation
Evaluation on five benchmarks (FewEvent, ACE2005, GENIA, SPEED, CASIE) and six LLM architectures shows consistent improvements over zero-shot baselines across entity and argument extraction metrics. Results for selected models:
| Model | FewEvent TI / TC | ACE2005 TI / TC |
|---|---|---|
| Llama3-8B | 27.0 / 27.6 | 40.5 / 48.8 |
| Best baseline | 25.2 / 24.8 | 36.1 / 44.2 |
| Llama3-70B | 42.1 / 40.5 | 57.0 / 54.6 |
| Next best | 35.3 / 40.7 | 51.5 / 51.4 |
Other LLMs (Qwen2.5-72B, GPT-3.5-turbo, GPT-4o) exhibit analogous improvements, with absolute gains of +3–6% in trigger identification/classification (TI/TC) and +2–4% in argument identification/classification (AI/AC). Ablation studies show that eliminating any agent—especially Retrieval or Verification—causes large performance drops (up to 10 TI points). Performance plateaus for 5 hypotheses and 6 refinement attempts.
6. Principles, Limitations, and Extensions
AEC frames event extraction as code generation and validation, using schema-as-code and deterministic tests to minimize hallucinations and schema violations that affect direct prompting. Explicit modular decomposition into Retrieval, Planning, Coding, and Verification improves stepwise interpretability and allows targeted diagnosis of error points. The iterative refinement loop with compiler-like diagnostics yields robust zero-shot extraction, even absent in-domain training data.
Noted limitations include increased latency due to multiple agent invocations and code compilation, and scalability challenges with highly complex or deeply nested schemas. Proposed extensions involve augmenting the Retrieval Agent with external knowledge bases for exemplar generation, developing a Postmortem Agent to address persistent extraction failures, and leveraging verified event objects as pseudo-labeled data for few-shot or semi-supervised learning scenarios.
The AEC paradigm operationalizes a software-engineering perspective for zero-shot event extraction, tightly integrating LLM reasoning, modular workflow decomposition, and runtime schema validation for precise and schema-compliant outputs (Guo et al., 17 Nov 2025).