LEEC: A Legal Element Extraction Dataset with an Extensive Domain-Specific Label System (2310.01271v2)
Abstract: As a pivotal task in natural language processing, element extraction has gained significance in the legal domain. Extracting legal elements from judicial documents helps enhance interpretative and analytical capacities of legal cases, and thereby facilitating a wide array of downstream applications in various domains of law. Yet existing element extraction datasets are limited by their restricted access to legal knowledge and insufficient coverage of labels. To address this shortfall, we introduce a more comprehensive, large-scale criminal element extraction dataset, comprising 15,831 judicial documents and 159 labels. This dataset was constructed through two main steps: first, designing the label system by our team of legal experts based on prior legal research which identified critical factors driving and processes generating sentencing outcomes in criminal cases; second, employing the legal knowledge to annotate judicial documents according to the label system and annotation guideline. The Legal Element ExtraCtion dataset (LEEC) represents the most extensive and domain-specific legal element extraction dataset for the Chinese legal system. Leveraging the annotated data, we employed various SOTA models that validates the applicability of LEEC for Document Event Extraction (DEE) task. The LEEC dataset is available on https://github.com/THUlawtech/LEEC .
- “EQUALS: A Real-world Dataset for Legal Question Answering via Reading Chinese Laws” In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, 2023, pp. 71–80
- Jill K Doerner and Stephen Demuth “The independent and joint effects of race/ethnicity, gender, and age on sentencing outcomes in US federal courts” In Justice Quarterly 27.1 Taylor & Francis, 2010, pp. 1–27
- Yi Feng, Chuanyi Li and Vincent Ng “Legal Judgment Prediction via Event Extraction with Constraints” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Dublin, Ireland: Association for Computational Linguistics, 2022, pp. 648–664 DOI: 10.18653/v1/2022.acl-long.48
- Ralph Grishman, David Westbrook and Adam Meyers “Nyu’s english ace 2005 system description” In ACE 5, 2005, pp. 2
- Kaihao Guo, Tianpei Jiang and Haipeng Zhang “Knowledge graph enhanced event extraction in financial documents” In 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 1322–1329 IEEE
- “An Overview of Event Extraction from Text.” In DeRiVE@ ISWC, 2011, pp. 48–57
- “Ethnic discrimination and authoritarian rule: An analysis of criminal sentencing in China” In Available at SSRN 3481448, 2020
- “Hukou Status and Sentencing in the Wake of Internal Migration: The Penalty Effect of Being Rural-to-Urban Migrants in China” In Law & Policy 40.2 Wiley Online Library, 2018, pp. 196–215
- “SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval”, 2023 arXiv:2304.11370 [cs.IR]
- “THUIR@COLIEE 2023: Incorporating Structural Knowledge into Pre-trained Language Models for Legal Case Retrieval”, 2023 arXiv:2305.06812 [cs.IR]
- “THUIR@COLIEE 2023: More Parameters and Legal Knowledge for Legal Case Entailment”, 2023 arXiv:2305.06817 [cs.CL]
- “DuEE: a large-scale dataset for Chinese event extraction in real-world scenarios” In Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part II 9, 2020, pp. 534–545 Springer
- “Using document level cross-event inference to improve event extraction” In Proceedings of the 48th annual meeting of the association for computational linguistics, 2010, pp. 789–797
- “Investigating Conversational Agent Action in Legal Case Retrieval” In European Conference on Information Retrieval, 2023, pp. 622–635 Springer
- “LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System” In SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021 ACM, 2021, pp. 2342–2348 DOI: 10.1145/3404835.3463250
- “A dataset for open event extraction in english” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 2016, pp. 1939–1943
- Roscoe Pound “Law in books and law in action” In American Law Review 44 HeinOnline, 1910, pp. 12
- “Explaining the “female victim effect” in capital punishment: An examination of victim sex–specific models of juror sentence decision-making” In Crime & Delinquency 62.7 Sage Publications Sage CA: Los Angeles, CA, 2016, pp. 875–898
- “Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism” In Proceedings of the 28th International Conference on Computational Linguistics Barcelona, Spain (Online): International Committee on Computational Linguistics, 2020, pp. 100–113 DOI: 10.18653/v1/2020.coling-main.9
- G Sierra “Event extraction from legal documents in spanish” In 1st Workshop on Language Resources and Technologies for the Legal Knowledge Graph, 2018, pp. 36
- “Ethnicity and sentencing outcomes in US federal courts: Who is punished more harshly?” In American sociological review JSTOR, 2000, pp. 705–729
- Yingmao Tang and John Zhuang Liu “Mass publicity of Chinese court decisions” In China Review 19.2 JSTOR, 2019, pp. 15–40
- Vu Tran, Minh Le Nguyen and Ken Satoh “Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model” In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, 2019, pp. 275–282
- Jeffery T Ulmer “Recent developments and new directions in sentencing research” In Justice Quarterly 29.1 Taylor & Francis, 2012, pp. 1–40
- Jeffery T Ulmer and Brian Johnson “Sentencing in context: A multilevel analysis” In Criminology 42.1 Wiley Online Library, 2004, pp. 137–178
- “MEE: A novel multilingual event extraction dataset” In arXiv preprint arXiv:2211.05955, 2022
- “MAVEN: A Massive General Domain Event Detection Dataset” In Proceedings of EMNLP, 2020, pp. 1652–1671 DOI: 10.18653/v1/2020.emnlp-main.129
- “CAIL2019-SCM: A Dataset of Similar Case Matching in Legal Domain” In CoRR abs/1911.08962, 2019 arXiv: http://arxiv.org/abs/1911.08962
- “Dcfee: A document-level chinese financial event extraction system based on automatically labeled training data” In Proceedings of ACL 2018, System Demonstrations, 2018, pp. 50–55
- “Unsupervised Legal Evidence Retrieval via Contrastive Learning with Approximate Aggregated Positive” In Proceedings of the AAAI Conference on Artificial Intelligence 37.4, 2023, pp. 4783–4791
- “LEVEN: A Large-Scale Chinese Legal Event Detection Dataset” In Findings of ACL, 2022, pp. 183–201 DOI: 10.18653/v1/2022.findings-acl.17
- “Doc2EDAG: An end-to-end document-level framework for Chinese financial event extraction” In arXiv preprint arXiv:1904.07535, 2019
- “Efficient document-level event extraction via pseudo-trigger-aware pruned complete graph” In arXiv preprint arXiv:2112.06013, 2021
- Xue Zongyue (1 paper)
- Liu Huanghai (1 paper)
- Hu Yiran (2 papers)
- Kong Kangle (1 paper)
- Wang Chenlu (1 paper)
- Liu Yun (1 paper)
- Shen Weixing (1 paper)