Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

STARD: A Chinese Statute Retrieval Dataset with Real Queries Issued by Non-professionals (2406.15313v1)

Published 21 Jun 2024 in cs.IR and cs.CL

Abstract: Statute retrieval aims to find relevant statutory articles for specific queries. This process is the basis of a wide range of legal applications such as legal advice, automated judicial decisions, legal document drafting, etc. Existing statute retrieval benchmarks focus on formal and professional queries from sources like bar exams and legal case documents, thereby neglecting non-professional queries from the general public, which often lack precise legal terminology and references. To address this gap, we introduce the STAtute Retrieval Dataset (STARD), a Chinese dataset comprising 1,543 query cases collected from real-world legal consultations and 55,348 candidate statutory articles. Unlike existing statute retrieval datasets, which primarily focus on professional legal queries, STARD captures the complexity and diversity of real queries from the general public. Through a comprehensive evaluation of various retrieval baselines, we reveal that existing retrieval approaches all fall short of these real queries issued by non-professional users. The best method only achieves a Recall@100 of 0.907, suggesting the necessity for further exploration and additional research in this area. All the codes and datasets are available at: https://github.com/oneal2000/STARD/tree/main

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Weihang Su (27 papers)
  2. Yiran Hu (16 papers)
  3. Anzhe Xie (1 paper)
  4. Qingyao Ai (113 papers)
  5. Zibing Que (1 paper)
  6. Ning Zheng (16 papers)
  7. Yun Liu (213 papers)
  8. Weixing Shen (7 papers)
  9. Yiqun Liu (131 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.