Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LEEC: A Legal Element Extraction Dataset with an Extensive Domain-Specific Label System (2310.01271v2)

Published 2 Oct 2023 in cs.CL and cs.IR

Abstract: As a pivotal task in natural language processing, element extraction has gained significance in the legal domain. Extracting legal elements from judicial documents helps enhance interpretative and analytical capacities of legal cases, and thereby facilitating a wide array of downstream applications in various domains of law. Yet existing element extraction datasets are limited by their restricted access to legal knowledge and insufficient coverage of labels. To address this shortfall, we introduce a more comprehensive, large-scale criminal element extraction dataset, comprising 15,831 judicial documents and 159 labels. This dataset was constructed through two main steps: first, designing the label system by our team of legal experts based on prior legal research which identified critical factors driving and processes generating sentencing outcomes in criminal cases; second, employing the legal knowledge to annotate judicial documents according to the label system and annotation guideline. The Legal Element ExtraCtion dataset (LEEC) represents the most extensive and domain-specific legal element extraction dataset for the Chinese legal system. Leveraging the annotated data, we employed various SOTA models that validates the applicability of LEEC for Document Event Extraction (DEE) task. The LEEC dataset is available on https://github.com/THUlawtech/LEEC .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. “EQUALS: A Real-world Dataset for Legal Question Answering via Reading Chinese Laws” In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, 2023, pp. 71–80
  2. Jill K Doerner and Stephen Demuth “The independent and joint effects of race/ethnicity, gender, and age on sentencing outcomes in US federal courts” In Justice Quarterly 27.1 Taylor & Francis, 2010, pp. 1–27
  3. Yi Feng, Chuanyi Li and Vincent Ng “Legal Judgment Prediction via Event Extraction with Constraints” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Dublin, Ireland: Association for Computational Linguistics, 2022, pp. 648–664 DOI: 10.18653/v1/2022.acl-long.48
  4. Ralph Grishman, David Westbrook and Adam Meyers “Nyu’s english ace 2005 system description” In ACE 5, 2005, pp. 2
  5. Kaihao Guo, Tianpei Jiang and Haipeng Zhang “Knowledge graph enhanced event extraction in financial documents” In 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 1322–1329 IEEE
  6. “An Overview of Event Extraction from Text.” In DeRiVE@ ISWC, 2011, pp. 48–57
  7. “Ethnic discrimination and authoritarian rule: An analysis of criminal sentencing in China” In Available at SSRN 3481448, 2020
  8. “Hukou Status and Sentencing in the Wake of Internal Migration: The Penalty Effect of Being Rural-to-Urban Migrants in China” In Law & Policy 40.2 Wiley Online Library, 2018, pp. 196–215
  9. “SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval”, 2023 arXiv:2304.11370 [cs.IR]
  10. “THUIR@COLIEE 2023: Incorporating Structural Knowledge into Pre-trained Language Models for Legal Case Retrieval”, 2023 arXiv:2305.06812 [cs.IR]
  11. “THUIR@COLIEE 2023: More Parameters and Legal Knowledge for Legal Case Entailment”, 2023 arXiv:2305.06817 [cs.CL]
  12. “DuEE: a large-scale dataset for Chinese event extraction in real-world scenarios” In Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part II 9, 2020, pp. 534–545 Springer
  13. “Using document level cross-event inference to improve event extraction” In Proceedings of the 48th annual meeting of the association for computational linguistics, 2010, pp. 789–797
  14. “Investigating Conversational Agent Action in Legal Case Retrieval” In European Conference on Information Retrieval, 2023, pp. 622–635 Springer
  15. “LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System” In SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021 ACM, 2021, pp. 2342–2348 DOI: 10.1145/3404835.3463250
  16. “A dataset for open event extraction in english” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 2016, pp. 1939–1943
  17. Roscoe Pound “Law in books and law in action” In American Law Review 44 HeinOnline, 1910, pp. 12
  18. “Explaining the “female victim effect” in capital punishment: An examination of victim sex–specific models of juror sentence decision-making” In Crime & Delinquency 62.7 Sage Publications Sage CA: Los Angeles, CA, 2016, pp. 875–898
  19. “Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism” In Proceedings of the 28th International Conference on Computational Linguistics Barcelona, Spain (Online): International Committee on Computational Linguistics, 2020, pp. 100–113 DOI: 10.18653/v1/2020.coling-main.9
  20. G Sierra “Event extraction from legal documents in spanish” In 1st Workshop on Language Resources and Technologies for the Legal Knowledge Graph, 2018, pp. 36
  21. “Ethnicity and sentencing outcomes in US federal courts: Who is punished more harshly?” In American sociological review JSTOR, 2000, pp. 705–729
  22. Yingmao Tang and John Zhuang Liu “Mass publicity of Chinese court decisions” In China Review 19.2 JSTOR, 2019, pp. 15–40
  23. Vu Tran, Minh Le Nguyen and Ken Satoh “Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model” In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, 2019, pp. 275–282
  24. Jeffery T Ulmer “Recent developments and new directions in sentencing research” In Justice Quarterly 29.1 Taylor & Francis, 2012, pp. 1–40
  25. Jeffery T Ulmer and Brian Johnson “Sentencing in context: A multilevel analysis” In Criminology 42.1 Wiley Online Library, 2004, pp. 137–178
  26. “MEE: A novel multilingual event extraction dataset” In arXiv preprint arXiv:2211.05955, 2022
  27. “MAVEN: A Massive General Domain Event Detection Dataset” In Proceedings of EMNLP, 2020, pp. 1652–1671 DOI: 10.18653/v1/2020.emnlp-main.129
  28. “CAIL2019-SCM: A Dataset of Similar Case Matching in Legal Domain” In CoRR abs/1911.08962, 2019 arXiv: http://arxiv.org/abs/1911.08962
  29. “Dcfee: A document-level chinese financial event extraction system based on automatically labeled training data” In Proceedings of ACL 2018, System Demonstrations, 2018, pp. 50–55
  30. “Unsupervised Legal Evidence Retrieval via Contrastive Learning with Approximate Aggregated Positive” In Proceedings of the AAAI Conference on Artificial Intelligence 37.4, 2023, pp. 4783–4791
  31. “LEVEN: A Large-Scale Chinese Legal Event Detection Dataset” In Findings of ACL, 2022, pp. 183–201 DOI: 10.18653/v1/2022.findings-acl.17
  32. “Doc2EDAG: An end-to-end document-level framework for Chinese financial event extraction” In arXiv preprint arXiv:1904.07535, 2019
  33. “Efficient document-level event extraction via pseudo-trigger-aware pruned complete graph” In arXiv preprint arXiv:2112.06013, 2021
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xue Zongyue (1 paper)
  2. Liu Huanghai (1 paper)
  3. Hu Yiran (2 papers)
  4. Kong Kangle (1 paper)
  5. Wang Chenlu (1 paper)
  6. Liu Yun (1 paper)
  7. Shen Weixing (1 paper)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub