Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction (1904.07535v2)

Published 16 Apr 2019 in cs.CL and cs.LG

Abstract: Most existing event extraction (EE) methods merely extract event arguments within the sentence scope. However, such sentence-level EE methods struggle to handle soaring amounts of documents from emerging applications, such as finance, legislation, health, etc., where event arguments always scatter across different sentences, and even multiple such event mentions frequently co-exist in the same document. To address these challenges, we propose a novel end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic graph to fulfill the document-level EE (DEE) effectively. Moreover, we reformalize a DEE task with the no-trigger-words design to ease the document-level event labeling. To demonstrate the effectiveness of Doc2EDAG, we build a large-scale real-world dataset consisting of Chinese financial announcements with the challenges mentioned above. Extensive experiments with comprehensive analyses illustrate the superiority of Doc2EDAG over state-of-the-art methods. Data and codes can be found at https://github.com/dolphin-zs/Doc2EDAG.

PDF Abstract

Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction

This paper presents a novel framework, Doc2EDAG, designed to address the challenges of document-level event extraction (DEE) in applications such as finance, where event arguments are often dispersed across different sentences and multiple events coexist within the same document. The authors notably shift the focus from traditional sentence-level event extraction methods to a document-level approach, developing an innovative model that leverages an entity-based directed acyclic graph (EDAG) to perform end-to-end DEE tasks.

Key Contributions

Entity-Based Directed Acyclic Graph (EDAG): The introduction of EDAGs transforms the event data into a more structured format, enabling the model to tackle the complex task of table filling through a series of sequential path-expanding sub-tasks. This approach simplifies the extraction process and enhances the model's ability to capture the dispersed arguments across a document.
No-Trigger-Words Design: By eliminating the dependence on trigger words for event detection, the authors reformalize the DEE task to focus directly on filling event tables. This design facilitates easier document-level event labeling through distant supervision (DS), without relying on predefined trigger word sets.
Document-Level Entity Encoding: To address the arguments-scattering challenge, Doc2EDAG encodes entities with document-level context, ensuring that the model considers the full context of the document in which an entity appears, rather than limiting the scope to individual sentences.
Memory Mechanism: An innovative memory mechanism is introduced to support path expansion, maintaining a history of extracted entities and enhancing the model’s capability to address multi-event and scattered arguments scenarios.

Experimental Results

The authors conducted experiments on a comprehensive Chinese financial announcements dataset, significantly larger than previously available datasets. The results demonstrate that Doc2EDAG outperforms existing state-of-the-art methods, with notable improvements in precision, recall, and F1 scores across several event types. Specifically, the model shows strong performance on both single-event and multi-event documents, a critical improvement given the complexity of real-world applications.

Practical and Theoretical Implications

The development of Doc2EDAG holds significant practical implications for real-world applications in domains such as finance, legislation, and healthcare, where extracting structured information from documents is critical. Theoretically, this approach opens up new avenues for DEE research, suggesting that similar methods could be applied to other languages or domains with minimal domain-specific modifications.

Future Directions

The authors suggest that future research might explore expanding the input formats beyond plain text to include richly formatted documents, further enhancing the model's utility in diverse practical settings. The framework's adaptability offers promising potential for enhancing artificial intelligence systems tasked with understanding and structuring complex document-level information.

In conclusion, the Doc2EDAG framework presents a significant advancement in document-level event extraction, offering a robust methodology that effectively addresses the inherent challenges of multi-event and arguments-scattering in large-scale document corpora.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Shun Zheng (23 papers)
Wei Cao (71 papers)
Wei Xu (536 papers)
Jiang Bian (229 papers)

Citations (151)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - dolphin-zs/Doc2EDAG (346 stars)