Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval (2403.18405v1)

Published 27 Mar 2024 in cs.AI and cs.IR
Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

Abstract: Collecting relevant judgments for legal case retrieval is a challenging and time-consuming task. Accurately judging the relevance between two legal cases requires a considerable effort to read the lengthy text and a high level of domain expertise to extract Legal Facts and make juridical judgments. With the advent of advanced LLMs, some recent studies have suggested that it is promising to use LLMs for relevance judgment. Nonetheless, the method of employing a general LLM for reliable relevance judgments in legal case retrieval is yet to be thoroughly explored. To fill this research gap, we devise a novel few-shot workflow tailored to the relevant judgment of legal cases. The proposed workflow breaks down the annotation process into a series of stages, imitating the process employed by human annotators and enabling a flexible integration of expert reasoning to enhance the accuracy of relevance judgments. By comparing the relevance judgments of LLMs and human experts, we empirically show that we can obtain reliable relevance judgments with the proposed workflow. Furthermore, we demonstrate the capacity to augment existing legal case retrieval models through the synthesis of data generated by the LLM.

Automated Annotation Workflow for Legal Case Relevance Using LLMs

Overview

Recent advancements in LLMs have opened up new avenues for automating complex tasks that require deep understanding and reasoning capabilities. In the field of legal informatics, one of the longstanding challenges has been the retrieval of relevant cases for legal analysis—a task that not only demands meticulous reading of lengthy documents but also requires substantial domain expertise. A novel approach presented by Shengjie Ma et al. aims to address this challenge by leveraging the potential of LLMs, specifically targeting the task of relevance judgment in legal case retrieval. This paper introduces a tailored few-shot workflow that automates the annotation of legal case relevance, exhibiting a high consistency with human expert judgments and enhancing the performance of legal case retrieval models.

Methodology

The core of this paper is the innovative automated annotation workflow it proposes, designed to harness the reasoning power of general LLMs for assessing the relevance of legal cases. The workflow is comprised of four stages:

  1. Preliminary Legal Analysis: Engages legal experts to prepare detailed relevance indications by dissecting legal cases into Material and Legal Facts, which serve as a guiding framework for the LLM.
  2. Adaptive Demo-Matching (ADM): Uses BM25 to retrieve the most pertinent expert demonstrations for each case, optimizing the LLM's ability to mimic human expert reasoning.
  3. Fact Extraction (FE): Sequentially extracts Material and Legal Facts from the cases using step-by-step prompts, refined with selected demonstrations.
  4. Fact Annotation (FA): Evaluates the relevance of the extracted facts between pairs of cases, again guided by expert reasoning encapsulated in the demonstrations.

This multi-stage process mirrors the complex reasoning and annotation tasks performed by human experts, enabling the LLM to generate annotations that align well with expert judgments.

Experimental Results

The efficacy of the proposed annotation workflow was validated through a series of empirical experiments using the Chinese Legal Case Retrieval Dataset (LeCaRD). The findings revealed high reliability and consistency of the LLM-generated relevance judgments with human annotations, as indicated by Cohen's Kappa measures across different temperature settings.

The experiments further demonstrated the practical utility of the synthesized annotations in augmenting legal case retrieval models. When leveraged for fine-tuning, these annotations led to significant improvements in the performance of baseline retrieval models, suggesting that the method can effectively generate valuable synthetic data for model training.

Implications and Future Directions

The outcomes underscore the potential of leveraging advanced general LLMs for domain-specific annotation tasks, particularly in fields that require considering nuanced professional knowledge, such as law. The proposed methodology not only facilitates the scalable generation of high-quality annotated data but also promotes a deeper integration of AI into legal informatics. By automating parts of the legal analysis process, this approach stands to significantly enhance the efficiency and accessibility of legal case retrieval systems.

Looking forward, the adaptability of this workflow promises broader applicability across various legal domains and geographical jurisdictions, contingent on the availability of minimal expert guidance to tailor the process. It opens up intriguing possibilities for extending the application of automated relevance annotation to other complex legal tasks, potentially revolutionizing legal research and practice by integrating more sophisticated AI capabilities.

In conclusion, the work of Shengjie Ma and colleagues represents a critical step towards realizing the full potential of LLMs in automating and enhancing legal case retrieval, offering a scalable solution for generating annotated legal data and improving the efficacy of legal retrieval systems. Future research could explore the extension of this workflow to other complex domains, further unlocking the capabilities of LLMs in professional and academic fields.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).
  2. LEGAL-BERT: The muppets straight out of law school. arXiv preprint arXiv:2010.02559 (2020).
  3. Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092 (2023).
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  5. Perspectives on Large Language Models for Relevance Judgment. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR ’23). ACM. https://doi.org/10.1145/3578337.3605136
  6. Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056 (2023).
  7. GPT2: Empirical slant delay model for radio space geodetic techniques. Geophysical research letters 40, 6 (2013), 1069–1073.
  8. SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval. arXiv preprint arXiv:2304.11370 (2023).
  9. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
  10. Meng Yutong Liu Hongcheng, Liao Yusheng and Wang Yuhao. 2023. LawGPT: 中文法律对话语言模型. (2023). https://github.com/LiuHC0428/LAW_GPT
  11. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7
  12. LeCaRD: a legal case retrieval dataset for Chinese law system. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2342–2348.
  13. Do we still need human assessors? prompt-based gpt-3 user simulation in conversational ai. In Proceedings of the 4th Conference on Conversational User Interfaces. 1–6.
  14. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  15. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
  16. Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94. Springer, 232–241.
  17. UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers. arXiv preprint arXiv:2303.00807 (2023).
  18. BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval.. In IJCAI. 3501–3507.
  19. Understanding Relevance Judgments in Legal Case Retrieval. ACM Transactions on Information Systems 41, 3 (2023), 1–32.
  20. Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement. arXiv:2310.18440 [cs.CL]
  21. Large language models can accurately predict searcher preferences. arXiv:2309.10621 [cs.IR]
  22. LLaMA: Open and Efficient Foundation Language Models. http://arxiv.org/abs/2302.13971 cite arxiv:2302.13971.
  23. Semantic data augmentation based distance metric learning for domain generalization. In Proceedings of the 30th ACM International Conference on Multimedia. 3214–3223.
  24. Promda: Prompt-based data augmentation for low-resource nlu tasks. arXiv preprint arXiv:2202.12499 (2022).
  25. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  26. Lawformer: A pre-trained language model for chinese legal long documents. AI Open 2 (2021), 79–84.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shengjie Ma (7 papers)
  2. Chong Chen (122 papers)
  3. Qi Chu (52 papers)
  4. Jiaxin Mao (47 papers)
Citations (7)