Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting legal case retrieval by query content selection with large language models (2312.03494v1)

Published 6 Dec 2023 in cs.IR

Abstract: Legal case retrieval, which aims to retrieve relevant cases to a given query case, benefits judgment justice and attracts increasing attention. Unlike generic retrieval queries, legal case queries are typically long and the definition of relevance is closely related to legal-specific elements. Therefore, legal case queries may suffer from noise and sparsity of salient content, which hinders retrieval models from perceiving correct information in a query. While previous studies have paid attention to improving retrieval models and understanding relevance judgments, we focus on enhancing legal case retrieval by utilizing the salient content in legal case queries. We first annotate the salient content in queries manually and investigate how sparse and dense retrieval models attend to those content. Then we experiment with various query content selection methods utilizing LLMs to extract or summarize salient content and incorporate it into the retrieval models. Experimental results show that reformulating long queries using LLMs improves the performance of both sparse and dense models in legal case retrieval.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Improving BERT-based query-by-document retrieval with multi-task optimization. In Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II. Springer, 3–12.
  2. Do Charge Prediction Models Learn Legal Theory? arXiv preprint arXiv:2210.17108 (2022).
  3. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2017), 6077–6086.
  4. LeiBi@COLIEE 2022: Aggregating Tuned Lexical Models with a Cluster-driven BERT-based Model for Case Law Retrieval. ArXiv abs/2205.13351 (2022).
  5. Arian Askari and Suzan Verberne. 2021. Combining Lexical and Neural Retrieval with Longformer-based Summarization for Effective Case Law Retrieval. In Biennial Conference on Design of Experimental Search & Information Retrieval Systems.
  6. A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law. Artificial Intelligence and Law 20 (2012), 215–319.
  7. Michael Bendersky and W. Bruce Croft. 2008. Discovering key concepts in verbose queries. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. https://api.semanticscholar.org/CorpusID:2512107
  8. What Does BERT Look at? An Analysis of BERT’s Attention. In BlackboxNLP@ACL.
  9. Precise Zero-Shot Dense Retrieval without Relevance Labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 1762–1777. https://doi.org/10.18653/v1/2023.acl-long.99
  10. Bottom-Up Abstractive Summarization. ArXiv abs/1808.10792 (2018).
  11. Legal Feature Enhanced Semantic Matching Network for Similar Case Matching. 2020 International Joint Conference on Neural Networks (IJCNN) (2020), 1–8.
  12. How Does BERT Rerank Passages? An Attribution Analysis with Information Bottlenecks. In BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP.
  13. COLIEE-2018: Evaluation of the Competition on Legal Information Extraction and Entailment. In JSAI-isAI Workshops.
  14. UnifiedQA: Crossing Format Boundaries With a Single QA System. In Findings.
  15. An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics. Comput. Surveys 55 (2022), 1 – 35.
  16. SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023). https://api.semanticscholar.org/CorpusID:258298805
  17. THUIR@ COLIEE 2023: Incorporating Structural Knowledge into Pre-trained Language Models for Legal Case Retrieval. arXiv preprint arXiv:2305.06812 (2023).
  18. A Cooperative Neural Information Retrieval Pipeline with Knowledge Enhanced Automatic Query Reformulation. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (2022).
  19. Automatic Query Generation from Legal Texts for Case Law Retrieval. In Asia Information Retrieval Symposium. https://api.semanticscholar.org/CorpusID:37467052
  20. LeCaRD: a legal case retrieval dataset for Chinese law system. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2342–2348.
  21. Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search. ArXiv abs/2303.06573 (2023). https://api.semanticscholar.org/CorpusID:257495903
  22. Bhaskar Mitra and Nick Craswell. 2017. Neural Models for Information Retrieval. ArXiv abs/1705.01509 (2017).
  23. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. ArXiv abs/1901.04085 (2019).
  24. Training language models to follow instructions with human feedback. ArXiv abs/2203.02155 (2022).
  25. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
  26. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.
  27. BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval.. In IJCAI. 3501–3507.
  28. Understanding Relevance Judgments in Legal Case Retrieval. ACM Transactions on Information Systems 41 (2022), 1 – 32.
  29. Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer. ArXiv abs/2305.16380 (2023).
  30. Building Legal Case Retrieval Systems with Lexical Matching and Summarization using A Pre-Trained Phrase Scoring Model. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law (2019).
  31. Leveraging Passage-level Cumulative Gain for Document Ranking. Proceedings of The Web Conference 2020 (2020).
  32. Investigating Passage-level Relevance and Its Role in Document-level Relevance Judgment. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019).
  33. CAIL2019-SCM: A Dataset of Similar Case Matching in Legal Domain. ArXiv abs/1911.08962 (2019).
  34. Explainable Legal Case Matching via Inverse Optimal Transport-based Rationale Extraction. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022).
  35. ChengXiang Zhai. 2008. Statistical language models for information retrieval. Synthesis lectures on human language technologies 1, 1 (2008), 1–141.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Youchao Zhou (2 papers)
  2. Heyan Huang (107 papers)
  3. Zhijing Wu (21 papers)
Citations (2)