Exploring Zero-Shot Event Argument Extraction Through QA and TI Techniques
Introduction
Recent endeavors in the field of Event Argument Extraction (EAE) have pivoted towards enabling zero-shot ontology transfer, leveraging reformulations of the EAE task through Question Answering (QA) and Template Infilling (TI). This paper conducts a comprehensive paper across six major EAE datasets, spanning both sentence and document levels, to benchmark the zero-shot transfer capabilities of QA and TI methods. Contrary to the prevailing use of LLMs like GPT-3.5 or GPT-4 for such tasks, the paper presents evidence that smaller models, specifically trained on well-suited source ontologies, can achieve superior zero-shot performance.
Approaches
The paper bifurcates EAE into two methodological approaches:
- Question Answering (QA) revolves around transforming role labels into questions focused on event participants, with questions varying widely in form and specificity.
- Template Infilling (TI) treats EAE as an infilling task, where predefined templates for each event type are populated with extracted arguments. This approach has the dual benefits of requiring only a single encoder pass for argument extraction and considering role arguments jointly.
Templates and questions for all six ontologies under paper were meticulously crafted, ensuring applicability across different event types without relying on specific role names within the questions or template text.
Experiments
The empirical analysis utilized Flan-T5-based models for both QA and TI tasks, assessing their zero-shot transfer performance across a matrix of source and target datasets. The evaluation also included zero-shot performances of GPT-3.5 and GPT-4 for comparison. Datasets encompassed a diverse range of domains, from news articles to Wikipedia passages, each with its unique ontology.
Results and Analysis
The findings indicate a pronounced advantage of custom-trained Flan-T5 models over considerably larger counterparts (GPT-3.5 and GPT-4) in zero-shot transfer tasks, except for the FAMuS dataset where GPT-4 had a marginal edge. This suggests a nuanced understanding of the EAE task that smaller models can capture, given a well-aligned source ontology.
Further observations include:
- Varying zero-shot success between QA and TI methods across different target datasets, suggesting neither approach uniformly outperforms the other.
- High performance correlation between in-domain and zero-shot setups within similar ontologies, indicating a transferable skill set across closely related domains.
- Augmentation of training data with paraphrases showed mixed results, improving performance in some TI setups but yielding inconsistent gains across QA setups.
Conclusion
The paper underscored the efficacy of smaller, specifically trained models over LLMs like GPT-3.5 and GPT-4 in accomplishing zero-shot EAE tasks. It highlighted the potential of QA and TI methodologies in enabling ontology transfer, suggesting their complementary utility depending on the target domain or ontology. The exploration also pointed to the value of generating domain-specific questions and templates as a scalable approach to adapting EAE models for new event types and ontologies.
Limitations
The research emphasized its focus on scenarios where gold event triggers were provided, noting that full event extraction scenarios might present different challenges. Moreover, while the term "zero-shot" was employed to describe cross-ontology evaluations, the existent overlap between event and role types in the studied ontologies might not fully encapsulate the broader spectrum of zero-shot applications. Lastly, the absolute performance on most ontology pairs was acknowledged as modest, potentially limiting the immediate applicability of these methods in real-world scenarios without further domain-specific tuning or enhancements.