LEVEN: A Large-Scale Chinese Legal Event Detection Dataset (2203.08556v1)

Published 16 Mar 2022 in cs.CL and cs.AI

Abstract: Recognizing facts is the most fundamental step in making judgments, hence detecting events in the legal documents is important to legal case analysis tasks. However, existing Legal Event Detection (LED) datasets only concern incomprehensive event types and have limited annotated data, which restricts the development of LED methods and their downstream applications. To alleviate these issues, we present LEVEN a large-scale Chinese LEgal eVENt detection dataset, with 8,116 legal documents and 150,977 human-annotated event mentions in 108 event types. Not only charge-related events, LEVEN also covers general events, which are critical for legal case understanding but neglected in existing LED datasets. To our knowledge, LEVEN is the largest LED dataset and has dozens of times the data scale of others, which shall significantly promote the training and evaluation of LED methods. The results of extensive experiments indicate that LED is challenging and needs further effort. Moreover, we simply utilize legal events as side information to promote downstream applications. The method achieves improvements of average 2.2 points precision in low-resource judgment prediction, and 1.5 points mean average precision in unsupervised case retrieval, which suggests the fundamentality of LED. The source code and dataset can be obtained from https://github.com/thunlp/LEVEN.

Authors (10)

Feng Yao (27 papers)
Chaojun Xiao (39 papers)
Xiaozhi Wang (51 papers)
Zhiyuan Liu (433 papers)
Lei Hou (127 papers)
Cunchao Tu (11 papers)
Juanzi Li (144 papers)
Yun Liu (213 papers)
Weixing Shen (7 papers)
Maosong Sun (337 papers)

Citations (52)

View on Semantic Scholar

Summary

An Overview of LEVEN: A Large-Scale Chinese Legal Event Detection Dataset

The paper "LEVEN: A Large-Scale Chinese Legal Event Detection Dataset" presents a comprehensive dataset designed to facilitate legal event detection (LED) in Chinese legal documents. This research introduces LEVEN, a dataset that significantly enhances the scope and scale of existing resources for LED, aiming to address the lack of extensive annotated data and comprehensive event schemas in previous works.

Dataset Design and Composition

LEVEN is composed of 8,116 legal documents and 150,977 human-annotated event mentions spanning 108 event types. The dataset does not solely focus on charge-related events. Instead, it encompasses general event types that are crucial for understanding legal cases but have been neglected in prior datasets. This broad coverage facilitates a more thorough analysis and understanding of legal proceedings as contextualized within the Chinese legal system. As a result, LEVEN stands out as the largest LED dataset available, presenting itself as a valuable resource for training and evaluating LED methods effectively.

Key Findings and Implications

From extensive experiments, it is evident that LED remains a challenging task that requires further research efforts. The paper employs LEVEN to demonstrate improvements in downstream tasks such as Legal Judgment Prediction (LJP) and Similar Case Retrieval (SCR), with a noted average enhancement of 2.2 points in precision for judgment prediction and 1.5 points in mean average precision for unsupervised case retrieval. These findings emphasize the fundamental role of LED in legal AI applications, suggesting that a deeper understanding of legal events can significantly enhance various legal AI processes.

Theoretical and Practical Implications

The inclusion of a wide range of event types in LEVEN, covering both specific charges and general events, underscores a shift towards a holistic approach in legal event detection. The results presented indicate that LEVEN can improve the precision of low-resource legal tasks by providing detailed event information that can be used to draw richer interpretations from the data. In practice, this means that legal professionals and AI systems can better predict outcomes, align cases with relevant legal precedents, and better navigate the complexities of the legal landscape.

Future Directions

The authors highlight several areas for future exploration. Given the intricacies of the legal domain and the novelty of a dataset like LEVEN, continuous improvements in model architectures specific to LED are expected to yield better results. Moreover, extending this dataset or similar endeavors to incorporate multi-lingual or cross-jurisdictional datasets could enable broader applicability and richer insights into international legal systems. Additionally, further research into contextual predictors and cross-sentence relations could uncover more nuanced and contextually appropriate triggers.

Conclusion

The introduction of LEVEN marks a significant advancement in Chinese legal event detection, compiling a vast array of human-annotated data with a broad event schema. The dataset advances the benchmark for LED tasks and provides crucial impetus for research into how NLP can be leveraged more effectively within legal contexts. This paper underscores the importance of dataset scope and quality in enhancing machine learning applications in specialized domains, such as law, pointing to a future where AI significantly augments legal analysis.

PDF Markdown

Related Papers

GitHub

GitHub - thunlp/LEVEN: Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset" (104 stars)