Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding (2305.05393v1)

Published 9 May 2023 in cs.IR and cs.CL

Abstract: Legal case retrieval is a critical process for modern legal information systems. While recent studies have utilized pre-trained LLMs (PLMs) based on the general domain self-supervised pre-training paradigm to build models for legal case retrieval, there are limitations in using general domain PLMs as backbones. Specifically, these models may not fully capture the underlying legal features in legal case documents. To address this issue, we propose CaseEncoder, a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases. In the data sampling phase, we enhance the quality of the training data by utilizing fine-grained law article information to guide the selection of positive and negative examples. In the pre-training phase, we design legal-specific pre-training tasks that align with the judging criteria of relevant legal cases. Based on these tasks, we introduce an innovative loss function called Biased Circle Loss to enhance the model's ability to recognize case relevance in fine grains. Experimental results on multiple benchmarks demonstrate that CaseEncoder significantly outperforms both existing general pre-training models and legal-specific pre-training models in zero-shot legal case retrieval.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yixiao Ma (11 papers)
  2. Yueyue Wu (18 papers)
  3. Weihang Su (27 papers)
  4. Qingyao Ai (113 papers)
  5. Yiqun Liu (131 papers)
Citations (13)