Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges (2410.21306v1)

Published 25 Oct 2024 in cs.CL and cs.AI
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges

Abstract: Natural Language Processing is revolutionizing the way legal professionals and laypersons operate in the legal field. The considerable potential for Natural Language Processing in the legal sector, especially in developing computational tools for various legal processes, has captured the interest of researchers for years. This survey follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, reviewing 148 studies, with a final selection of 127 after manual filtering. It explores foundational concepts related to Natural Language Processing in the legal domain, illustrating the unique aspects and challenges of processing legal texts, such as extensive document length, complex language, and limited open legal datasets. We provide an overview of Natural Language Processing tasks specific to legal text, such as Legal Document Summarization, legal Named Entity Recognition, Legal Question Answering, Legal Text Classification, and Legal Judgment Prediction. In the section on legal LLMs, we analyze both developed LLMs and approaches for adapting general LLMs to the legal domain. Additionally, we identify 15 Open Research Challenges, including bias in Artificial Intelligence applications, the need for more robust and interpretable models, and improving explainability to handle the complexities of legal language and reasoning.

Natural Language Processing for the Legal Domain: A Comprehensive Survey

The paper "Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges" offers an extensive analysis of the application of NLP in the legal field. It underlines how NLP reshapes legal practices by aiding in computational tasks, presenting detailed insights into various specialized tasks within Legal NLP, including Legal Document Summarization (LDS), Legal Named Entity Recognition (NER), Legal Question Answering (LQA), Legal Text Classification (LTC), Legal Judgment Prediction (LJP), and more.

Tasks and Methodological Approaches in Legal NLP

The paper outlines the unique complexity of legal texts, emphasizing lengthy documents, nuanced language, and limited open-access datasets, all of which pose significant challenges to NLP systems. This complexity demands refined approaches that can handle the distinct properties of legal language. Legal NLP encompasses specific tasks such as:

  • Legal Document Summarization (LDS): The summarization task must account for the structured and formal nature of legal documents, with techniques ranging from extractive to abstractive summarization approaches.
  • Legal Named Entity Recognition (NER): Recognizing entities within legal documents involves identifying various specific entities, including legal acts, case law, statutes, and more. This task requires sophisticated methods adapted to the intricacies of legal language.
  • Legal Question Answering (LQA): LQA tasks necessitate models to understand and interpret complex legal questions and answer with precise legal information. The paper discusses several studies employing models like transformers and BERT for efficient task execution.
  • Legal Text Classification (LTC): Text classification involves categorizing legal documents into predefined categories, leveraging sophisticated classification algorithms to handle the substantial and complex label spaces inherent in legal databases.
  • Legal Judgment Prediction (LJP): Predicting outcomes of legal cases using historical data is a critical area of focus. The survey outlines various models and methods applied in large-scale legal datasets.

Datasets and LLMs

The research highlights the importance of specialized datasets and tailored LLMs for the legal domain. It provides an overview of numerous datasets used for legal NLP tasks, detailing their construction and adaptation for different legal systems and jurisdictions.

Furthermore, the development and adaptation of LLMs (LMs) for legal tasks form a critical part of this research. Models such as legal-bert, Lawformer, SauLLM-7B, and Legal-LM are explored, demonstrating the need for domain-specific LMs trained on specialized legal corpora. The integration of legal specific knowledge via KG is also discussed, enhancing the models' capacity to deliver accurate and contextually relevant legal insights.

Challenges and Future Directions

While NLP offers transformative capabilities to legal processes, the paper identifies key challenges such as the inherent biases of AI applications, the need for sophisticated, robust, and interpretable models, and the challenge of processing complex legal language and reasoning. The challenge of fairness and transparency in AI decisions remains paramount, given the potential impacts on the rights and lives of individuals involved.

The paper concludes with proposed future directions, underscoring the necessity for more comprehensive datasets, enhanced legal text processing methods, and more nuanced approaches to integrate legal reasoning within AI systems. It suggests areas for further research, such as expanding multilingual capabilities and incorporating ethical considerations like bias mitigation and fairness to ensure the responsible deployment of AI in legal contexts.

This survey serves as a critical resource for researchers and practitioners in the legal NLP field, addressing the current capabilities, datasets, and technological challenges while paving the way for advancements in the efficient and fair application of AI in legal practices.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (142)
  1. AraLegal-BERT: A pretrained language model for Arabic Legal text. In Proceedings of the Natural Legal Language Processing Workshop 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 338–344. https://doi.org/10.18653/v1/2022.nllp-1.31
  2. Intisar Almuslim and Diana Inkpen. 2022. Legal Judgment Prediction for Canadian Appeal Cases. In 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA). IEEE, Riyadh, Saudi Arabia, 163–168. https://doi.org/10.1109/CDMA54072.2022.00032
  3. AWS Amazon. [n. d.]. What are Transformers in Artificial Intelligence? Retrieved July 24, 2024 from https://aws.amazon.com/what-is/transformers-in-artificial-intelligence
  4. The Impact of Large Language Modeling on Natural Language Processing in Legal Texts: A Comprehensive Survey. In 2023 15th International Conference on Knowledge and Systems Engineering (KSE). IEEE, Hanoi, Vietnam, 1–7. https://doi.org/10.1109/KSE59128.2023.10299488
  5. Expert Finding in Legal Community Question Answering. In Advances in Information Retrieval: 44th European Conference on IR Research (ECIR 2022), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Berlin, Heidelberg, 22–30.
  6. Answer Retrieval in Legal Community Question Answering. In Advances in Information Retrieval: 46th European Conference on Information Retrieval (ECIR 2024), Nazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, and Iadh Ounis (Eds.). Springer Nature Switzerland, Berlin, Heidelberg, 477–485.
  7. E-NER — An Annotated Named Entity Recognition Corpus of Legal Text. In Proceedings of the Natural Legal Language Processing Workshop 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 246–255. https://doi.org/10.18653/v1/2022.nllp-1.22
  8. Purbid Bambroo and Aditi Awasthi. 2021. LegalDB: Long DistilBERT for Legal Document Classification. In 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT). IEEE, 1–4. https://doi.org/10.1109/ICAECT49130.2021.9392558
  9. AsyLex: A Dataset for Legal Language Processing of Refugee Claims. In Proceedings of the Natural Legal Language Processing Workshop 2023, Daniel Preo\textcommabelowtiuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos Spanakis, and Nikolaos Aletras (Eds.). Association for Computational Linguistics, Singapore, 244–257. https://doi.org/10.18653/v1/2023.nllp-1.24
  10. Longformer: The long-document transformer. arXiv:2004.05150 [cs.CL]
  11. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 610–623. https://doi.org/10.1145/3442188.3445922
  12. A Comparative Study of Summarization Algorithms Applied to Legal Case Judgments. Advances in Information Retrieval (2019), 413–428.
  13. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS ’20). Curran Associates Inc., Red Hook, NY, USA, Article 159, 25 pages.
  14. Marius Büttner and Ivan Habernal. 2024. Answering legal questions from laymen in German civil law system. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Yvette Graham and Matthew Purver (Eds.). Association for Computational Linguistics, St. Julian’s, Malta, 2015–2027. https://aclanthology.org/2024.eacl-long.122
  15. Neural Legal Judgment Prediction in English. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 4317–4323. https://doi.org/10.18653/v1/P19-1424
  16. Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation. In Proceedings of the Natural Legal Language Processing Workshop 2019, Nikolaos Aletras, Elliott Ash, Leslie Barrett, Daniel Chen, Adam Meyers, Daniel Preotiuc-Pietro, David Rosenberg, and Amanda Stent (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–87. https://doi.org/10.18653/v1/W19-2209
  17. MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 6974–6996. https://doi.org/10.18653/v1/2021.emnlp-main.559
  18. LEGAL-BERT: The Muppets straight out of Law School. In Findings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 2898–2904. https://doi.org/10.18653/v1/2020.findings-emnlp.261
  19. LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 4310–4330. https://doi.org/10.18653/v1/2022.acl-long.297
  20. FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 4389–4406. https://doi.org/10.18653/v1/2022.acl-long.301
  21. EQUALS: A Real-world Dataset for Legal Question Answering via Reading Chinese Laws. In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law (Braga, Portugal) (ICAIL ’23). Association for Computing Machinery, New York, NY, USA, 71–80. https://doi.org/10.1145/3594536.3595159
  22. Multi-Task Learning in Natural Language Processing: An Overview. ACM Comput. Surv. 56, 12, Article 295 (jul 2024), 32 pages. https://doi.org/10.1145/3663363
  23. A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law. arXiv:2405.01769 [cs.CL]
  24. Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure. arXiv:2405.08502 [cs.CL]
  25. SaulLM-7B: A pioneering Large Language Model for Law. arXiv:2403.03883 [cs.CL]
  26. A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges. IEEE Access 11 (2023), 102050–102071. https://doi.org/10.1109/ACCESS.2023.3317083
  27. NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Lucia Specia, Matt Post, and Michael Paul (Eds.). Association for Computational Linguistics, Copenhagen, Denmark, 97–102. https://doi.org/10.18653/v1/D17-2017
  28. Aniket Deroy and Subhankar Maity. 2023. Questioning Biases in Case Judgment Summaries: Legal Datasets or Large Language Models? arXiv:2312.00554 [cs.CL]
  29. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  30. State of the Art in Artificial Intelligence applied to the Legal Domain. arXiv:2204.07047 [cs.CL]
  31. Discourse-Aware Unsupervised Summarization for Long Scientific Documents. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Paola Merlo, Jorg Tiedemann, and Reut Tsarfaty (Eds.). Association for Computational Linguistics, Online, 1089–1102. https://doi.org/10.18653/v1/2021.eacl-main.93
  32. Named Entity Recognition and Resolution in Legal Text. In Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language, Enrico Francesconi, Simonetta Montemagni, Wim Peters, and Daniela Tiscornia (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 27–43. https://doi.org/10.1007/978-3-642-12837-0_2
  33. Gary Edmond and Kristy A Martire. 2019. Just cognition: scientific research on bias and some implications for legal procedure and decision-making. The modern law review 82, 4 (2019), 633–664.
  34. Multi-Task Deep Learning for Legal Document Translation, Summarization and Multi-Label Classification. In Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference (Tokyo, Japan) (AICCC ’18). Association for Computing Machinery, New York, NY, USA, 9–15. https://doi.org/10.1145/3299819.3299844
  35. Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 22 (2004), 457–479.
  36. Atefeh Farzindar. 2004. Atefeh Farzindar and Guy Lapalme,’LetSum, an automatic Legal Text Summarizing system’in T. Gordon (ed.), Legal Knowledge and Information Systems. Jurix 2004: The Seventeenth Annual Conference. Amsterdam: IOS Press, 2004, pp. 11-18.. In Legal knowledge and information systems: JURIX 2004, the seventeenth annual conference, Vol. 120. IOS Press, 11.
  37. Legal Judgment Prediction via Event Extraction with Constraints. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 648–664. https://doi.org/10.18653/v1/2022.acl-long.48
  38. Context-Aware Classification of Legal Document Pages. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 3285–3289. https://doi.org/10.1145/3539618.3591839
  39. Legal IR and NLP: The History, Challenges, and State-of-the-Art. In European Conference on Information Retrieval (ECIR) (Advances in Information Retrieval). Springer-Verlag, Berlin, Heidelberg, 331–340. https://doi.org/10.1007/978-3-031-28241-6_34
  40. Daphne Gelbart and JC Smith. 1991a. Flexicon, a new legal information retrieval system. Can. L. Libr. 16 (1991), 9.
  41. Dephne Gelbart and J. C. Smith. 1991b. Beyond boolean search: FLEXICON, a legal tex-based intelligent system. In Proceedings of the 3rd International Conference on Artificial Intelligence and Law (Oxford, England) (ICAIL ’91). Association for Computing Machinery, New York, NY, USA, 225–234. https://doi.org/10.1145/112646.112674
  42. LLaMandement: Large Language Models for Summarization of French Legislative Proposals. arXiv:2401.16182 [cs.CL]
  43. John Gibbons and M. Teresa Turell. 2008. Dimensions of Forensic Linguistics (1 ed.). AILA Applied Linguistics Series, Vol. 5. John Benjamins Publishing Company, Netherlands. 1–317 pages.
  44. Overview and Discussion of the Competition on Legal Information, Extraction/Entailment (COLIEE) 2023. The Review of Socionetwork Strategies 18, 1 (2024), 27–47.
  45. Natural language processing for legal document review: categorising deontic modalities in contracts. Artificial Intelligence and Law (2023). https://doi.org/10.1007/s10506-023-09379-2
  46. Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., Red Hook, NY, USA, 29217–29234. https://proceedings.neurips.cc/paper_files/paper/2022/file/bc218a0c656e49d4b086975a9c785f47-Paper-Datasets_and_Benchmarks.pdf
  47. Cuad: An expert-annotated nlp dataset for legal contract review. arXiv:2103.06268 [cs.CL]
  48. AILA: A Question Answering System in the Legal Domain. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 (Yokohama, Yokohama, Japan) (IJCAI’20), Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, Article 762, 3 pages. https://doi.org/10.24963/ijcai.2020/762
  49. A sentence is known by the company it keeps: Improving Legal Document Summarization Using Deep Clustering. Artificial Intelligence and Law 32, 1 (2024), 165–200.
  50. From Text to Structure: Using Large Language Models to Support the Development of Legal Expert Systems. arXiv:2311.04911 [cs.CL]
  51. Mistral 7B. arXiv:2310.06825 [cs.CL]
  52. Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling. arXiv:2402.17019 [cs.CL]
  53. One model to learn them all. arXiv:1706.05137 [cs.LG]
  54. Named Entity Recognition in Indian court judgments. In Proceedings of the Natural Legal Language Processing Workshop 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 184–193. https://doi.org/10.18653/v1/2022.nllp-1.15
  55. Text summarization from legal documents: a survey. Artificial Intelligence Review 51 (2019), 371–402.
  56. GPT-4 passes the bar exam. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 382, 2270 (2024), 20230254. https://doi.org/10.1098/rsta.2023.0254
  57. A Free Format Legal Question Answering System. In Proceedings of the Natural Legal Language Processing Workshop 2021, Nikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, and Daniel Preotiuc-Pietro (Eds.). Association for Computational Linguistics, Punta Cana, Dominican Republic, 107–113. https://doi.org/10.18653/v1/2021.nllp-1.11
  58. A Survey on Challenges and Advances in Natural Language Processing with a Focus on Legal Informatics and Low-Resource Languages. Electronics 13, 3 (2024). https://doi.org/10.3390/electronics13030648
  59. ContrastNER: Contrastive-based Prompt Tuning for Few-shot NER. In 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 241–249. https://doi.org/10.1109/COMPSAC57700.2023.00038
  60. Jihoon Lee and Hyukjoon Lee. 2019. A Comparison Study on Legal Document Classification Using Deep Neural Networks. In 2019 International Conference on Information and Communication Technology Convergence (ICTC). 926–928. https://doi.org/10.1109/ICTC46691.2019.8939926
  61. A Dataset of German Legal Documents for Named Entity Recognition. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association, Marseille, France, 4478–4485. https://aclanthology.org/2020.lrec-1.551
  62. LexisNexis [n. d.]. International Legal Generative AI Report. Retrieved July 22, 2024 from https://www.lexisnexis.com/community/pressroom/b/news/posts/lexisnexis-international-legal-generative-ai-survey-shows-nearly-half-of-the-legal-profession-believe-generative-ai-will-transform-the-practice-of-law
  63. Parameter-Efficient Legal Domain Adaptation. In Proceedings of the Natural Legal Language Processing Workshop 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 119–129. https://doi.org/10.18653/v1/2022.nllp-1.10
  64. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering 34, 1 (Jan 2022), 50–70. https://doi.org/10.1109/TKDE.2020.2981314
  65. Pre-Trained Language Models for Text Generation: A Survey. ACM Comput. Surv. 56, 9 (apr 2024), 1–39. https://doi.org/10.1145/3649449
  66. BERT-CNN based evidence retrieval and aggregation for Chinese legal multi-choice question answering. Neural Computing and Applications 36, 11 (2024), 5909–5925. https://doi.org/10.1007/s00521-023-09380-5
  67. CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service. Artificial Intelligence and Law 27 (2019), 117–139.
  68. Low-resource court judgment summarization for common law systems. Information Processing and Management 61, 5 (2024), 103796. https://doi.org/10.1016/j.ipm.2024.103796
  69. Yang Liu. 2019. Fine-tune BERT for extractive summarization. arXiv:1903.10318 [cs.CL]
  70. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692 [cs.CL]
  71. ML-LJP: Multi-Law Aware Legal Judgment Prediction. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 1023–1034. https://doi.org/10.1145/3539618.3591731
  72. Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. AAAI Press, 22266–22275. https://doi.org/10.1609/aaai.v38i20.30232
  73. Learning to Predict Charges for Criminal Cases with Legal Basis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Martha Palmer, Rebecca Hwa, and Sebastian Riedel (Eds.). Association for Computational Linguistics, Copenhagen, Denmark, 2727–2736. https://doi.org/10.18653/v1/D17-1289
  74. Legal Judgment Prediction with Multi-Stage Case Representation Learning in the Real Court Setting. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 993–1002. https://doi.org/10.1145/3404835.3462945
  75. Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer. In Proceedings of the Natural Legal Language Processing Workshop 2022, Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goan\textcommabelowtă, and Daniel Preo\textcommabelowtiuc-Pietro (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 130–142. https://doi.org/10.18653/v1/2022.nllp-1.11
  76. An Efficient Active Learning Pipeline for Legal Text Classification. In Proceedings of the Natural Legal Language Processing Workshop 2022, Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goan\textcommabelowtă, and Daniel Preo\textcommabelowtiuc-Pietro (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 345–358. https://doi.org/10.18653/v1/2022.nllp-1.32
  77. Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models. In Proceedings of the Natural Legal Language Processing Workshop 2022, Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goan\textcommabelowtă, and Daniel Preo\textcommabelowtiuc-Pietro (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 88–110. https://doi.org/10.18653/v1/2022.nllp-1.8
  78. Suzanne McGee. [n. d.]. Generative AI and the Law. Retrieved July 22, 2024 from https://www.lexisnexis.com/html/lexisnexis-generative-ai-story
  79. Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law 28, 2 (2020), 237–266. https://doi.org/10.1007/s10506-019-09255-y
  80. Abstracting of legal cases: the potential of clustering based on the selection of representative objects. Journal of the American Society for Information Science 50, 2 (1999), 151–161.
  81. Laurens Mommers. 2010. Ontologies in the Legal Domain. Springer Netherlands, Dordrecht, 265–276. https://doi.org/10.1007/978-90-481-8845-1_12
  82. Multi-language transfer learning for low-resource legal case summarization. Artificial Intelligence and Law (2023). https://doi.org/10.1007/s10506-023-09373-8
  83. Robust Deep Reinforcement Learning for Extractive Legal Summarization. In Neural Information Processing, Teddy Mantoro, Minho Lee, Media Anugerah Ayu, Kok Wai Wong, and Achmad Nizar Hidayanto (Eds.). Springer International Publishing, Cham, 597–604.
  84. Attentive deep neural networks for legal document retrieval. Artificial Intelligence and Law 32, 1 (2024), 57–86. https://doi.org/10.1007/s10506-022-09341-8
  85. Swiss-Judgment-Prediction: A Multilingual Legal Judgment Prediction Benchmark. In Proceedings of the Natural Legal Language Processing Workshop 2021, Nikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, and Daniel Preotiuc-Pietro (Eds.). Association for Computational Linguistics, Punta Cana, Dominican Republic, 19–35. https://doi.org/10.18653/v1/2021.nllp-1.3
  86. LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 3016–3054. https://doi.org/10.18653/v1/2023.findings-emnlp.200
  87. MultiLegalPile: A 689GB Multilingual Legal Corpus. arXiv:2306.02069 [cs.CL]
  88. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
  89. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Systematic Reviews 10, 1 (2021), 89. https://doi.org/10.1186/s13643-021-01626-4
  90. Multi-granular Legal Topic Classification on Greek Legislation. In Proceedings of the Natural Legal Language Processing Workshop 2021, Nikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, and Daniel Preotiuc-Pietro (Eds.). Association for Computational Linguistics, Punta Cana, Dominican Republic, 63–75. https://doi.org/10.18653/v1/2021.nllp-1.6
  91. Sungmi Park and Joshua I. James. 2023. Lessons learned building a legal inference dataset. Artificial Intelligence and Law (2023). https://doi.org/10.1007/s10506-023-09370-x
  92. CaseSummarizer: A System for Automated Summarization of Legal Texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, Hideo Watanabe (Ed.). The COLING 2016 Organizing Committee, Osaka, Japan, 258–262. https://aclanthology.org/C16-2054
  93. Legal Summarisation through LLMs: The PRODIGIT Project. arXiv:2308.04416 [cs.CL]
  94. Named Entity Recognition in the Romanian Legal Domain. In Proceedings of the Natural Legal Language Processing Workshop 2021, Nikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, and Daniel Preotiuc-Pietro (Eds.). Association for Computational Linguistics, Punta Cana, Dominican Republic, 9–18. https://doi.org/10.18653/v1/2021.nllp-1.2
  95. Overview and discussion of the competition on legal information extraction/entailment (COLIEE) 2021. The Review of Socionetwork Strategies 16, 1 (2022), 111–133.
  96. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1, Article 140 (jan 2020), 67 pages.
  97. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL]
  98. Explaining Legal Concepts with Augmented Large Language Models (GPT-4). arXiv:2306.09525 [cs.CL]
  99. Abstractive Summarization of Dutch Court Verdicts Using Sequence-to-sequence Models. In Proceedings of the Natural Legal Language Processing Workshop 2022, Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goan\textcommabelowtă, and Daniel Preo\textcommabelowtiuc-Pietro (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 76–87. https://doi.org/10.18653/v1/2022.nllp-1.7
  100. ClassActionPrediction: A Challenging Benchmark for Legal Judgment Prediction of Class Action Cases in the US. In Proceedings of the Natural Legal Language Processing Workshop 2022, Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goan\textcommabelowtă, and Daniel Preo\textcommabelowtiuc-Pietro (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 31–46. https://doi.org/10.18653/v1/2022.nllp-1.3
  101. Large scale legal text classification using transformer models. arXiv:2010.12871 [cs.CL]
  102. Multi-LexSum: Real-world Summaries of Civil Rights Lawsuits at Multiple Granularities. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 13158–13173. https://proceedings.neurips.cc/paper_files/paper/2022/file/552ef803bef9368c29e53c167de34b55-Paper-Datasets_and_Benchmarks.pdf
  103. The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China, 3407–3412. https://doi.org/10.18653/v1/D19-1339
  104. Legal-LM: Knowledge Graph Enhanced Large Language Models for Law Consulting. In Advanced Intelligent Computing Technology and Applications, De-Shuang Huang, Zhanjun Si, and Chuanlei Zhang (Eds.). Springer Nature Singapore, Singapore, 175–186.
  105. Legal Named Entity Recognition with Multi-Task Domain Adaptation. In Proceedings of the Natural Legal Language Processing Workshop 2022, Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goan\textcommabelowtă, and Daniel Preo\textcommabelowtiuc-Pietro (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 305–321. https://doi.org/10.18653/v1/2022.nllp-1.29
  106. Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training. Information Systems 106 (2022), 101718. https://doi.org/10.1016/j.is.2021.101718
  107. A dataset for evaluating legal question answering on private international law. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law (São Paulo, Brazil) (ICAIL ’21). Association for Computing Machinery, New York, NY, USA, 230–234. https://doi.org/10.1145/3462757.3466094
  108. DiscoLQA: zero-shot discourse-based legal question answering on European Legislation. Artificial Intelligence and Law (2024). https://doi.org/10.1007/s10506-023-09387-2
  109. Legal knowledge extraction for knowledge graph based question-answering. In Legal knowledge and information systems. IOS Press, 143–153. https://doi.org/10.3233/FAIA200858
  110. JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, Ruslan Mitkov and Galia Angelova (Eds.). Association for Computational Linguistics, Hissar, Bulgaria, 104–110. https://aclanthology.org/R11-1015
  111. Benjamin Strickson and Beatriz De La Iglesia. 2020. Legal Judgement Prediction for UK Courts. In Proceedings of the 3rd International Conference on Information Science and Systems (Cambridge, United Kingdom) (ICISS ’20). Association for Computing Machinery, New York, NY, USA, 204–209. https://doi.org/10.1145/3388176.3388183
  112. Zhongxiang Sun. 2023. A Short Survey of Viewing Large Language Models in Legal Aspect. arXiv:2303.09136 [cs.CL]
  113. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. arXiv:2102.02503 [cs.CL]
  114. Biases in legal decision-making: Comparing prosecutors, defense attorneys, law students, and laypersons. Journal of empirical legal studies 20, 4 (2023), 852–894.
  115. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. 142–147. https://www.aclweb.org/anthology/W03-0419
  116. Legal Judgment Prediction via graph boosting with constraints. Information Processing & Management 61, 3 (2024), 103663. https://doi.org/10.1016/j.ipm.2024.103663
  117. LEDGAR: A Large-Scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association, Marseille, France, 1235–1241. https://aclanthology.org/2020.lrec-1.155
  118. Attention is All you Need. In Advances in Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17, Vol. 30), I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., Red Hook, NY, USA, 6000–6010. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  119. Graph attention networks. arXiv:1710.10903 [stat.ML]
  120. A topic discovery approach for unsupervised organization of legal document collections. Artificial Intelligence and Law (2023). https://doi.org/10.1007/s10506-023-09371-w
  121. D2GCLF: Document-to-Graph Classifier for Legal Document Classification. In Findings of the Association for Computational Linguistics: NAACL 2022, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (Eds.). Association for Computational Linguistics, Seattle, United States, 2208–2221. https://doi.org/10.18653/v1/2022.findings-naacl.170
  122. Empirical Study of Deep Learning for Text Classification in Legal Document Review. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 3317–3320. https://doi.org/10.1109/BigData.2018.8622157
  123. Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China, 6382–6388. https://doi.org/10.18653/v1/D19-1670
  124. Westlaw. [n. d.]. Westlaw. Retrieved May 23, 2024 from https://anzlaw.thomsonreuters.com/Browse/Home/Australia160?comp=wlau&__lrTS=20240523040153004&transitionType=Default&contextData=(sc.Default)
  125. Towards Interactivity and Interpretability: A Rationale-based Legal Judgment Prediction Framework. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 4787–4799. https://doi.org/10.18653/v1/2022.emnlp-main.316
  126. Lawformer: A pre-trained language model for Chinese legal long documents. AI Open 2 (2021), 79–84. https://doi.org/10.1016/j.aiopen.2021.06.003
  127. CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction. arXiv:1807.02478 [cs.CL] https://arxiv.org/abs/1807.02478
  128. Distinguish Confusing Law Articles for Legal Judgment Prediction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 3086–3095. https://doi.org/10.18653/v1/2020.acl-main.280
  129. Legal judgment prediction via multi-perspective bi-feedback network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China) (IJCAI’19). AAAI Press, 4085–4091.
  130. Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Marilyn Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, New Orleans, Louisiana, 1854–1864. https://doi.org/10.18653/v1/N18-1168
  131. Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model. Artificial Intelligence and Law (2023). https://doi.org/10.1007/s10506-023-09367-6
  132. Kwan Yuen Iu and Vanessa Man-Yi Wong. 2023. ChatGPT by OpenAI: The End of Litigation Lawyers. https://doi.org/10.2139/ssrn.4339839
  133. Contrastive Learning for Legal Judgment Prediction. ACM Transactions on Information Systems 41, 4, Article 113 (apr 2023), 25 pages. https://doi.org/10.1145/3580489
  134. GLQA: A Generation-based Method for Legal Question Answering. In 2023 International Joint Conference on Neural Networks (IJCNN). 1–8. https://doi.org/10.1109/IJCNN54540.2023.10191483
  135. When does pretraining help? assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law (São Paulo, Brazil) (ICAIL ’21). Association for Computing Machinery, New York, NY, USA, 159–168. https://doi.org/10.1145/3462757.3466088
  136. Legal Judgment Prediction via Topological Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, Brussels, Belgium, 3540–3549. https://doi.org/10.18653/v1/D18-1390
  137. Iteratively Questioning and Answering for Interpretable Legal Judgment Prediction. Proceedings of the AAAI Conference on Artificial Intelligence 34, 01 (Apr 2020), 1250–1257. https://doi.org/10.1609/aaai.v34i01.5479
  138. How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 5218–5230. https://doi.org/10.18653/v1/2020.acl-main.466
  139. JEC-QA: A Legal-Domain Question Answering Dataset. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 9701–9708. https://doi.org/10.48550/arXiv.1911.12011
  140. Automatic Summarization of Legal Decisions using Iterative Masking of Predictive Sentences. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law (Montreal, QC, Canada) (ICAIL ’19). Association for Computing Machinery, New York, NY, USA, 163–172. https://doi.org/10.1145/3322640.3326728
  141. Yang Zhong and Diane Litman. 2022. Computing and Exploiting Document Structure to Improve Unsupervised Extractive Summarization of Legal Case Decisions. In Proceedings of the Natural Legal Language Processing Workshop 2022, Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goan\textcommabelowtă, and Daniel Preo\textcommabelowtiuc-Pietro (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 322–337.
  142. The cambridge law corpus: a dataset for legal AI research. In Proceedings of the 37th International Conference on Neural Information Processing Systems (New Orleans, LA, USA) (NIPS ’23). Curran Associates Inc., Red Hook, NY, USA, Article 1793, 31 pages.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Farid Ariai (2 papers)
  2. Gianluca Demartini (34 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com