Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unlocking Practical Applications in Legal Domain: Evaluation of GPT for Zero-Shot Semantic Annotation of Legal Texts (2305.04417v1)

Published 8 May 2023 in cs.CL and cs.AI
Unlocking Practical Applications in Legal Domain: Evaluation of GPT for Zero-Shot Semantic Annotation of Legal Texts

Abstract: We evaluated the capability of a state-of-the-art generative pre-trained transformer (GPT) model to perform semantic annotation of short text snippets (one to few sentences) coming from legal documents of various types. Discussions of potential uses (e.g., document drafting, summarization) of this emerging technology in legal domain have intensified, but to date there has not been a rigorous analysis of these LLMs' (LLM) capacity in sentence-level semantic annotation of legal texts in zero-shot learning settings. Yet, this particular type of use could unlock many practical applications (e.g., in contract review) and research opportunities (e.g., in empirical legal studies). We fill the gap with this study. We examined if and how successfully the model can semantically annotate small batches of short text snippets (10-50) based exclusively on concise definitions of the semantic types. We found that the GPT model performs surprisingly well in zero-shot settings on diverse types of documents (F1=.73 on a task involving court opinions, .86 for contracts, and .54 for statutes and regulations). These findings can be leveraged by legal scholars and practicing lawyers alike to guide their decisions in integrating LLMs in wide range of workflows involving semantic annotation of legal texts.

The paper "Unlocking Practical Applications in Legal Domain: Evaluation of GPT for Zero-Shot Semantic Annotation of Legal Texts" presents an evaluation of the capabilities of a state-of-the-art Generative Pre-trained Transformer (GPT) model, specifically GPT-3.5, for semantic annotation of legal texts in a zero-shot learning context. The focus is on understanding how the model performs in classifying short text snippets from different types of legal documents, without prior domain-specific training.

Context and Objective:

Semantic annotation in the legal domain involves identifying and labeling parts of legal documents with specific legal categories or roles. The potential uses of this technology in legal practice include tasks like document drafting, summarization, and contract review. Previous studies have not rigorously tested the performance of GPT models in such legal-specific semantic annotation tasks. This paper aims to address this gap by evaluating the zero-shot performance of GPT-3.5 in this context.

Methodology:

  • Data Sets: The paper utilizes three different manually annotated datasets:

    1. BVA: Sentences from decisions made by the U.S. Board of Veterans' Appeals, annotated with rhetorical roles.
    2. CUAD: The Contract Understanding Atticus Dataset, which contains annotations of different types of contractual clauses.
    3. PHASYS: Statutory and regulatory provisions related to public health emergency preparedness and response, annotated with their purposes.
  • Evaluation Framework: The model’s performance is benchmarked against a traditional supervised learning model (random forest) and a fine-tuned base RoBERTa model. Performance was measured using micro F1-scores for model evaluation across different document types.

  • Prompting Approach: The paper uses carefully designed prompts to provide the model with semantic type definitions and text snippets, enabling zero-shot classification.

Results:

  • The GPT-3.5 model achieved surprisingly robust performance with micro F1-scores of 0.73 for BVA, 0.86 for CUAD, and 0.54 for PHASYS. This indicates the model’s capability in understanding and annotating complex legal texts to a reasonable extent without any task-specific training.
  • In comparison to random forest and fine-tuned RoBERTa models, GPT-3.5’s zero-shot performance was highly competitive, particularly with limited training data. However, as expected, the fine-tuned RoBERTa model outperformed GPT-3.5 when substantial annotated data was available for training.

Discussion and Implications:

  • The findings suggest that GPT-3.5 can be utilized effectively for semantic annotation tasks in legal practice without extensive labeled datasets, making it a valuable tool for legal practitioners, educators, and researchers interested in automating or enhancing document analysis workflows.
  • While promising, the paper acknowledges that in domains where high accuracy is critical, human verification might still be necessary, and in such cases, the creation of high-quality annotated datasets for fine-tuning remains essential.
  • The variability in performance across datasets highlights the role that data characteristics and definition clarity play in zero-shot learning tasks. Variations in annotation quality and dataset distribution, such as those in the PHASYS dataset, can significantly impact model performance.

Conclusion:

This paper demonstrates the practical viability of leveraging GPT models for zero-shot semantic annotation of legal texts, providing a foundation for further research and application. The results encourage the integration of advanced LLMs into legal workflows to enhance efficiency and explore new opportunities in empirical legal studies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Jaromir Savelka (47 papers)
Citations (39)