Small Models Are (Still) Effective Cross-Domain Argument Extractors (2404.08579v1)

Published 12 Apr 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Effective ontology transfer has been a major goal of recent work on event argument extraction (EAE). Two methods in particular -- question answering (QA) and template infilling (TI) -- have emerged as promising approaches to this problem. However, detailed explorations of these techniques' ability to actually enable this transfer are lacking. In this work, we provide such a study, exploring zero-shot transfer using both techniques on six major EAE datasets at both the sentence and document levels. Further, we challenge the growing reliance on LLMs for zero-shot extraction, showing that vastly smaller models trained on an appropriate source ontology can yield zero-shot performance superior to that of GPT-3.5 or GPT-4.

References (25)

Authors (2)

William Gantt (10 papers)
Aaron Steven White (29 papers)

Summary

Exploring Zero-Shot Event Argument Extraction Through QA and TI Techniques

Introduction

Recent endeavors in the field of Event Argument Extraction (EAE) have pivoted towards enabling zero-shot ontology transfer, leveraging reformulations of the EAE task through Question Answering (QA) and Template Infilling (TI). This paper conducts a comprehensive paper across six major EAE datasets, spanning both sentence and document levels, to benchmark the zero-shot transfer capabilities of QA and TI methods. Contrary to the prevailing use of LLMs like GPT-3.5 or GPT-4 for such tasks, the paper presents evidence that smaller models, specifically trained on well-suited source ontologies, can achieve superior zero-shot performance.

Approaches

The paper bifurcates EAE into two methodological approaches:

Question Answering (QA) revolves around transforming role labels into questions focused on event participants, with questions varying widely in form and specificity.
Template Infilling (TI) treats EAE as an infilling task, where predefined templates for each event type are populated with extracted arguments. This approach has the dual benefits of requiring only a single encoder pass for argument extraction and considering role arguments jointly.

Templates and questions for all six ontologies under paper were meticulously crafted, ensuring applicability across different event types without relying on specific role names within the questions or template text.

Experiments

The empirical analysis utilized Flan-T5-based models for both QA and TI tasks, assessing their zero-shot transfer performance across a matrix of source and target datasets. The evaluation also included zero-shot performances of GPT-3.5 and GPT-4 for comparison. Datasets encompassed a diverse range of domains, from news articles to Wikipedia passages, each with its unique ontology.

Results and Analysis

The findings indicate a pronounced advantage of custom-trained Flan-T5 models over considerably larger counterparts (GPT-3.5 and GPT-4) in zero-shot transfer tasks, except for the FAMuS dataset where GPT-4 had a marginal edge. This suggests a nuanced understanding of the EAE task that smaller models can capture, given a well-aligned source ontology.

Further observations include:

Varying zero-shot success between QA and TI methods across different target datasets, suggesting neither approach uniformly outperforms the other.
High performance correlation between in-domain and zero-shot setups within similar ontologies, indicating a transferable skill set across closely related domains.
Augmentation of training data with paraphrases showed mixed results, improving performance in some TI setups but yielding inconsistent gains across QA setups.

Conclusion

The paper underscored the efficacy of smaller, specifically trained models over LLMs like GPT-3.5 and GPT-4 in accomplishing zero-shot EAE tasks. It highlighted the potential of QA and TI methodologies in enabling ontology transfer, suggesting their complementary utility depending on the target domain or ontology. The exploration also pointed to the value of generating domain-specific questions and templates as a scalable approach to adapting EAE models for new event types and ontologies.

Limitations

The research emphasized its focus on scenarios where gold event triggers were provided, noting that full event extraction scenarios might present different challenges. Moreover, while the term "zero-shot" was employed to describe cross-ontology evaluations, the existent overlap between event and role types in the studied ontologies might not fully encapsulate the broader spectrum of zero-shot applications. Lastly, the absolute performance on most ontology pairs was acknowledged as modest, potentially limiting the immediate applicability of these methods in real-world scenarios without further domain-specific tuning or enhancements.

PDF Markdown

Related Papers

Tweets

https://twitter.com/aaronsteven/status/1780299951278272740

https://twitter.com/gm8xx8/status/1779684588001874090