MAVEN: A Massive General Domain Event Detection Dataset (2004.13590v2)

Published 28 Apr 2020 in cs.CL

Abstract: Event detection (ED), which means identifying event trigger words and classifying event types, is the first and most fundamental step for extracting event knowledge from plain text. Most existing datasets exhibit the following issues that limit further development of ED: (1) Data scarcity. Existing small-scale datasets are not sufficient for training and stably benchmarking increasingly sophisticated modern neural methods. (2) Low coverage. Limited event types of existing datasets cannot well cover general-domain events, which restricts the applications of ED models. To alleviate these problems, we present a MAssive eVENt detection dataset (MAVEN), which contains 4,480 Wikipedia documents, 118,732 event mention instances, and 168 event types. MAVEN alleviates the data scarcity problem and covers much more general event types. We reproduce the recent state-of-the-art ED models and conduct a thorough evaluation on MAVEN. The experimental results show that existing ED methods cannot achieve promising results on MAVEN as on the small datasets, which suggests that ED in the real world remains a challenging task and requires further research efforts. We also discuss further directions for general domain ED with empirical analyses. The source code and dataset can be obtained from https://github.com/THU-KEG/MAVEN-dataset.

Authors (10)

Xiaozhi Wang (51 papers)
Ziqi Wang (93 papers)
Xu Han (270 papers)
Wangyi Jiang (1 paper)
Rong Han (8 papers)
Zhiyuan Liu (433 papers)
Juanzi Li (144 papers)
Peng Li (390 papers)
Yankai Lin (125 papers)
Jie Zhou (687 papers)

Citations (172)

View on Semantic Scholar

Summary

MAVEN: A Massive General Domain Event Detection Dataset

The paper "MAVEN: A Massive General Domain Event Detection Dataset" introduces and details the MAVEN dataset, an extensive, human-annotated dataset designed to address significant limitations in event detection (ED) research. With its ambitious scale and expanded scope, MAVEN circumvents issues like data scarcity and limited event type coverage typical of existing datasets such as ACE 2005 and Rich ERE. The MAVEN dataset includes 4,480 Wikipedia documents with 118,732 event mentions across 168 event types, presenting a more comprehensive platform to develop and benchmark ED models.

Key Contributions

Dataset Size and Coverage: MAVEN dwarfs previous datasets with its size, significantly increasing opportunities for ED model benchmarking. The inclusion of a broad range of event types derived from FrameNet ensures a wider coverage of event semantics, enabling development in general domain ED.
Hierarchical Schema: The dataset employs a hierarchical event type schema, which organizes events into a tree structure. This schema aids in addressing the inherent data imbalance and tail distribution, nurturing models to better utilize hierarchical knowledge for nuanced event differentiation.
Evaluation on State-of-the-Art Models: The paper reproduces recent state-of-the-art neural ED models and evaluates them on MAVEN, finding a significant performance drop compared to traditional datasets. This underscores the dataset's challenge and richness, pushing for advanced model adaptations.
Dataset Split and Standardization: MAVEN is meticulously split into training, validation, and test sets, with negative instances provided officially. This ensures consistent evaluation across different models, promoting fair and reproducible research outcomes.

Experimental Insights

The paper's experiments demonstrate the complexity and challenges inherent in the MAVEN dataset. Popular models like DMCNN, BiLSTM, and BERT variants were adapted and tested. The results emphasized that while these models excel on smaller benchmarks, MAVEN's larger variety and complexity of event types expose their limitations.

Significantly, sequence models featuring CRFs, like BiLSTM+CRF, show improved handling of correlated events within sentences, a crucial trait given MAVEN's propensity for sentences containing multiple event triggers. However, the low performance across models hints at the need for advanced neural architectures capable of capturing deeper semantic correlations and distinctions.

Implications and Future Directions

MAVEN's comprehensive nature sets a new benchmark for ED, potentially influencing various downstream applications like information extraction, question answering, and knowledge base population. The dataset encourages exploration into more sophisticated models that can handle not only rich semantic diversity but also the nuanced interplay of multiple events.

Moreover, the potential of transfer learning revealed in Section 6.4 highlights rich avenues for leveraging MAVEN to supplement low-resource settings. Efficient knowledge transfer methods, such as intermediate pre-training, show promise in enhancing ED models in domains constrained by limited data.

MAVEN's introduction is poised to reshape general domain ED research. Future work might explore better integration of hierarchical information into model architectures, improved handling of multi-event sentences, and novel methods for transfer learning to leverage MAVEN's data richness. This dataset not only benchmarks existing ED methods but also paves the way for innovative research paradigms in the years to come.

PDF Markdown

Related Papers

GitHub

GitHub - THU-KEG/MAVEN-dataset: Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset". (151 stars)