Papers
Topics
Authors
Recent
Search
2000 character limit reached

Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

Published 5 Mar 2024 in cs.CL and cs.LG | (2403.03304v2)

Abstract: Event Argument Extraction (EAE) is an extremely difficult information extraction problem -- with significant limitations in few-shot cross-domain (FSCD) settings. A common solution to FSCD modeling is data augmentation. Unfortunately, existing augmentation methods are not well-suited to a variety of real-world EAE contexts including (i) The need to model long documents (10+ sentences) (ii) The need to model zero and few-shot roles (i.e. event roles with little to no training representation). In this work, we introduce two novel LLM-powered data augmentation frameworks for synthesizing extractive document-level EAE samples using zero in-domain training data. Our highest performing methods provide a 16-pt increase in F1 score on extraction of zero shot role types. To better facilitate analysis of cross-domain EAE, we additionally introduce a new metric, Role-Depth F1 (RDF1), which uses statistical depth to identify roles in the target domain which are semantic outliers with respect to roles observed in the source domain. Our experiments show that LLM-based augmentation can boost RDF1 performance by up to 11 F1 points compared to baseline methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Longformer: The long-document transformer. arXiv:2004.05150.
  2. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  3. Xinya Du and Claire Cardie. 2020. Event extraction by answering (almost) natural questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 671–683, Online. Association for Computational Linguistics.
  4. Multi-sentence argument linking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8057–8077, Online. Association for Computational Linguistics.
  5. Mask-then-fill: A flexible and effective data augmentation framework for event extraction. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4537–4544, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  6. English gigaword. Linguistic Data Consortium, Philadelphia, 4(1):34.
  7. A reevaluation of event extraction: Past, present, and future challenges.
  8. A dual-expert framework for event argument extraction. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1110–1121.
  9. Document-level event argument extraction by conditional generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 894–908, Online. Association for Computational Linguistics.
  10. Machine reading comprehension as data augmentation: A case study on implicit event argument extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2716–2725, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  11. Document-level event argument extraction with self-augmentation and a cross-domain joint training mechanism. Knowledge-Based Systems, 257:109904.
  12. Event extraction as question generation and answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1666–1688, Toronto, Canada. Association for Computational Linguistics.
  13. Star: Improving low-resource information extraction by structure-to-text data generation with large language models.
  14. Prompt for extraction? PAIE: Prompting argument interaction for event argument extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6759–6774, Dublin, Ireland. Association for Computational Linguistics.
  15. Hiroki Nakayama. 2018. seqeval: A python framework for sequence labeling evaluation. Software available from https://github.com/chakki-works/seqeval.
  16. GENEVA: Benchmarking generalizability for event argument extraction with hundreds of event types and argument roles. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3664–3686, Toronto, Canada. Association for Computational Linguistics.
  17. The devil is in the details: On the pitfalls of event extraction evaluation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9206–9227, Toronto, Canada. Association for Computational Linguistics.
  18. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  19. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv e-prints, page arXiv:1606.05250.
  20. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  21. Parker Seegmiller and Sarah Preum. 2023. Statistical depth for ranking and characterizing transformer-based text embeddings. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9600–9611, Singapore. Association for Computational Linguistics.
  22. DocEE: A large-scale and fine-grained benchmark for document-level event extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3970–3982, Seattle, United States. Association for Computational Linguistics.
  23. Boosting event extraction with denoised structure-to-text augmentation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11267–11281, Toronto, Canada. Association for Computational Linguistics.
  24. Huggingface’s transformers: State-of-the-art natural language processing. CoRR, abs/1910.03771.
  25. Few-shot document-level event argument extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8029–8046, Toronto, Canada. Association for Computational Linguistics.
  26. An AMR-based link prediction approach for document-level event argument extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12876–12889, Toronto, Canada. Association for Computational Linguistics.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.