Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Small Models Are (Still) Effective Cross-Domain Argument Extractors (2404.08579v1)

Published 12 Apr 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Effective ontology transfer has been a major goal of recent work on event argument extraction (EAE). Two methods in particular -- question answering (QA) and template infilling (TI) -- have emerged as promising approaches to this problem. However, detailed explorations of these techniques' ability to actually enable this transfer are lacking. In this work, we provide such a study, exploring zero-shot transfer using both techniques on six major EAE datasets at both the sentence and document levels. Further, we challenge the growing reliance on LLMs for zero-shot extraction, showing that vastly smaller models trained on an appropriate source ontology can yield zero-shot performance superior to that of GPT-3.5 or GPT-4.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. The Berkeley FrameNet project. In COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics.
  2. Reading the manual: Event extraction as definition comprehension. In Proceedings of the Fourth Workshop on Structured Prediction for NLP, pages 74–83, Online. Association for Computational Linguistics.
  3. A unified view of evaluation metrics for structured prediction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12868–12882, Singapore. Association for Computational Linguistics.
  4. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  5. DARPA. 2012. Broad agency announcement: Deep exploration and filtering of text (deft). DARPA-BAA 12-47.
  6. The automatic content extraction (ACE) program – tasks, data, and evaluation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).
  7. Xinya Du and Claire Cardie. 2020. Event extraction by answering (almost) natural questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 671–683, Online. Association for Computational Linguistics.
  8. GRIT: Generative role-filler transformers for document-level event entity extraction. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 634–644, Online. Association for Computational Linguistics.
  9. Multi-sentence argument linking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8057–8077, Online. Association for Computational Linguistics.
  10. Question-answer driven semantic role labeling: Using natural language to annotate natural language. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 643–653, Lisbon, Portugal. Association for Computational Linguistics.
  11. Asking the right questions in low resource template extraction. arXiv preprint arXiv:2205.12643.
  12. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear.
  13. DEGREE: A data-efficient generation-based event extraction model. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1890–1908, Seattle, United States. Association for Computational Linguistics.
  14. Document-level entity-based extraction as template generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5257–5269, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  15. Event extraction as multi-turn question answering. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 829–838, Online. Association for Computational Linguistics.
  16. Document-level event argument extraction by conditional generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 894–908, Online. Association for Computational Linguistics.
  17. Event extraction as machine reading comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1641–1651, Online. Association for Computational Linguistics.
  18. Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In International Conference on Learning Representations.
  19. Prompt for extraction? PAIE: Prompting argument interaction for event argument extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6759–6774, Dublin, Ireland. Association for Computational Linguistics.
  20. OpenAI. 2023. Gpt-4 technical report.
  21. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  22. From light to rich ERE: Annotation of entities, relations, and events. In Proceedings of the The 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, pages 89–98, Denver, Colorado. Association for Computational Linguistics.
  23. Famus: Frames across multiple sources. arXiv preprint arXiv:2311.05601.
  24. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
  25. Transfer learning from semantic role labeling to event argument extraction with template-based slot querying. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2627–2647, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. William Gantt (10 papers)
  2. Aaron Steven White (29 papers)

Summary

Exploring Zero-Shot Event Argument Extraction Through QA and TI Techniques

Introduction

Recent endeavors in the field of Event Argument Extraction (EAE) have pivoted towards enabling zero-shot ontology transfer, leveraging reformulations of the EAE task through Question Answering (QA) and Template Infilling (TI). This paper conducts a comprehensive paper across six major EAE datasets, spanning both sentence and document levels, to benchmark the zero-shot transfer capabilities of QA and TI methods. Contrary to the prevailing use of LLMs like GPT-3.5 or GPT-4 for such tasks, the paper presents evidence that smaller models, specifically trained on well-suited source ontologies, can achieve superior zero-shot performance.

Approaches

The paper bifurcates EAE into two methodological approaches:

  • Question Answering (QA) revolves around transforming role labels into questions focused on event participants, with questions varying widely in form and specificity.
  • Template Infilling (TI) treats EAE as an infilling task, where predefined templates for each event type are populated with extracted arguments. This approach has the dual benefits of requiring only a single encoder pass for argument extraction and considering role arguments jointly.

Templates and questions for all six ontologies under paper were meticulously crafted, ensuring applicability across different event types without relying on specific role names within the questions or template text.

Experiments

The empirical analysis utilized Flan-T5-based models for both QA and TI tasks, assessing their zero-shot transfer performance across a matrix of source and target datasets. The evaluation also included zero-shot performances of GPT-3.5 and GPT-4 for comparison. Datasets encompassed a diverse range of domains, from news articles to Wikipedia passages, each with its unique ontology.

Results and Analysis

The findings indicate a pronounced advantage of custom-trained Flan-T5 models over considerably larger counterparts (GPT-3.5 and GPT-4) in zero-shot transfer tasks, except for the FAMuS dataset where GPT-4 had a marginal edge. This suggests a nuanced understanding of the EAE task that smaller models can capture, given a well-aligned source ontology.

Further observations include:

  • Varying zero-shot success between QA and TI methods across different target datasets, suggesting neither approach uniformly outperforms the other.
  • High performance correlation between in-domain and zero-shot setups within similar ontologies, indicating a transferable skill set across closely related domains.
  • Augmentation of training data with paraphrases showed mixed results, improving performance in some TI setups but yielding inconsistent gains across QA setups.

Conclusion

The paper underscored the efficacy of smaller, specifically trained models over LLMs like GPT-3.5 and GPT-4 in accomplishing zero-shot EAE tasks. It highlighted the potential of QA and TI methodologies in enabling ontology transfer, suggesting their complementary utility depending on the target domain or ontology. The exploration also pointed to the value of generating domain-specific questions and templates as a scalable approach to adapting EAE models for new event types and ontologies.

Limitations

The research emphasized its focus on scenarios where gold event triggers were provided, noting that full event extraction scenarios might present different challenges. Moreover, while the term "zero-shot" was employed to describe cross-ontology evaluations, the existent overlap between event and role types in the studied ontologies might not fully encapsulate the broader spectrum of zero-shot applications. Lastly, the absolute performance on most ontology pairs was acknowledged as modest, potentially limiting the immediate applicability of these methods in real-world scenarios without further domain-specific tuning or enhancements.