Case Retrieval Module

Updated 12 November 2025

Case Retrieval Module is a computational component that extracts and ranks highly relevant prior cases from large multi-modal datasets.
It employs dual-encoder systems, graph-based representations, and weighted similarity aggregation to achieve precise and transparent matching.
Domain-driven adaptations integrate legal, medical, and technical elements, enhancing scalability, generalizability, and interpretability in retrieval tasks.

A Case Retrieval Module is a computational subsystem that, given an input “case” (text, image, tabular record, or structured legal/medical/technical file), retrieves a ranked list of relevant prior cases from a large repository. The technical implementations, evaluation protocols, and theoretical underpinnings of case retrieval modules vary across domains but are unified by the requirement for high-fidelity relevance modeling, robust representation of long and structured documents, and support for explainable matching. In recent years, retrieval modules have incorporated domain knowledge, sub-fact reasoning, multimodal signals, and structural regularities to advance retrieval accuracy and transparency, particularly in domains such as law, medicine, and real-time agent-based simulation.

1. System Architectures and Representational Principles

Case retrieval modules are generally organized as dual-stage or multi-stage pipelines. The core architectural elements are:

Input normalization and Knowledge Extraction: In specialized domains such as law, systems explicitly extract salient features (e.g., crimes, law-article references (Deng et al., 28 Jun 2024), legal facts/issues (Tang et al., 2023), charge descriptors (Tang et al., 26 Mar 2024), or medical codes (Yang, 4 Jul 2024)). This stage may involve LLM prompting, entity linking, or supervised information extraction.
Representation Learning: Case representations are constructed using one of the following:
- Neural Encoders: Transformer-based dual-encoders using [CLS] pooling (e.g., SAILER, BERT, RoBERTa) are standard for text cases (Deng et al., 28 Jun 2024, Su et al., 2023, Ma et al., 2023). For multimodal cases, parallel encoding of each component (images, structured fields) with modality-specific networks is used (Marom, 9 Jan 2025).
- Feature and Structure Augmentation: Representations may be augmented with extracted legal/medical elements, relation graphs, or sub-fact vectors (Deng et al., 28 Jun 2024, Tang et al., 30 Oct 2025, Li et al., 2023, Marchesin, 2018).
- Graph-Based Representations: Document-level semantic graphs, global case graphs, or explicit knowledge graphs are constructed for encoding relational structure (Tang et al., 26 Mar 2024, Marchesin, 2018).
Indexing and Retrieval: Depending on scale, cases are indexed using vector databases (e.g., FAISS HNSW for sub-linear nearest neighbor search (Yang, 4 Jul 2024, Deng et al., 28 Jun 2024)) or kept in-memory for brute-force evaluation in small corpora (Marom, 9 Jan 2025).
Ranking and Aggregation: Similarity metrics (cosine, dot product) are used in combination with aggregation schemes (MaxSim+Sum over sub-facts (Deng et al., 28 Jun 2024), weighted component scoring (Marom, 9 Jan 2025), cross-attention fusion (Gupta et al., 4 Nov 2025)) to compute final relevance scores.

2. Knowledge- and Reasoning-Guided Reformulation

Modern modules increasingly embed expert or LLM-extracted knowledge into the retrieval process.

Sub-fact Reformulation: Legal case retrieval has shifted toward explicit reformulation of cases into sub-facts, each anchored in legal knowledge (crime title + statutory reference + distilled fact). These sub-facts are generated via LLM prompts and serve as atomic units for similarity computation (Deng et al., 28 Jun 2024).
Prompt-based Abstraction: Systems like PromptCase extract condensed “legal facts” and “legal issues,” summarized or LLM-generated, which are then embedded independently and jointly, bypassing the input-length limits of vanilla transformers and reducing context loss (Tang et al., 2023).
Reasoning-Aware Embeddings: LLMs can be prompted to generate explicit legal reasoning chains (fact → relation → issue → decision); this structured reasoning is then embedded alongside fact/issue content, as in ReaKase-8B (Tang et al., 30 Oct 2025).
Element Generation: Generative retrieval such as LegalSearchLM directly uses LLMs to enumerate relevant legal elements under corpus-aware (FM-index-constrained) decoding, ensuring that every generated element supports direct retrieval of matching cases (Kim et al., 28 May 2025).

3. Similarity Computation and Retrieval Algorithms

Retrieval modules operationalize similarity at multiple granularity levels:

Vector Similarity: Most neural modules operate over L2-normalized vectors and compute scores via cosine similarity or dot product (Deng et al., 28 Jun 2024, Tang et al., 2023, Su et al., 2023, Ma et al., 2023, Yang, 4 Jul 2024). Multi-vector (sub-fact/component) schemes use a similarity matrix, with MaxSim per query sub-fact (Deng et al., 28 Jun 2024).
Weighted Aggregation: For multimodal or multi-component cases, overall similarity is computed as a weighted sum over component-level similarities (with weights summing to 1), as in MCBR-RAG (Marom, 9 Jan 2025).
Graph Structural Matching: Document-level semantic networks are compared structurally using graph edit distance, maximum common subgraph, or ontology-based node similarity (Marchesin, 2018).
Ranking and Diversity Control: Post-retrieval, multi-factor reranking may combine base semantic scores, domain-specific signals (e.g., citation frequency, jurisdiction match), and diversity-aware metrics such as MMR (Yang, 4 Jul 2024).

4. Learning Objectives and Supervision Paradigms

Training objectives for case retrieval modules are tailored to the granularity and structure of case relationships.

Contrastive and Listwise Ranking Losses: Standard objectives use temperature-scaled cross-entropy over positive (relevant) and negative (irrelevant) candidate pairs or triples, often in dual-encoder or cross-encoder settings (Deng et al., 28 Jun 2024, Su et al., 2023, Ma et al., 2023, Tang et al., 26 Mar 2024).
Multi-view Contrastive Learning: MVCL employs both traditional case-view contrastive loss and element-view contrastive loss where positive pairs are generated via deletion of non-element sentences, increasing the network’s sensitivity to legal elements (Wang, 2022).
Fine-grained, Legal-Aware Losses: CaseEncoder introduces Biased Circle Loss, which weights the contrastive loss in proportion to the overlap and fine-grained similarity of statutory article features, enhancing discrimination between closely related cases (Ma et al., 2023).
Self-supervised Generation: LegalSearchLM trains purely to reproduce “legal elements” from query cases, using no retrieval labels but ensuring groundability by FM-index constraints (Kim et al., 28 May 2025).

5. Benchmarks, Evaluation Protocols, and Empirical Findings

Empirical evaluation is conducted on large-scale, legally annotated retrieval benchmarks to ensure robustness and generalizability.

Datasets: Prominent datasets include LeCaRD (Chinese, ∼10k docs) and LeCaRDv2 (800 queries, ∼55k docs, multi-aspect annotation) (Deng et al., 28 Jun 2024, Li et al., 2023), COLIEE (English, ∼60k cases) (Tang et al., 30 Oct 2025, Su et al., 2023), LEGAR BENCH (Korean, 1.2M criminal cases, 411 groups) (Kim et al., 28 May 2025), and MUSER (Chinese, 4,024 annotated cases with multi-view labels) (Li et al., 2023).
Metrics: Standard IR metrics include MAP, MRR, Precision@K, Recall@K, and nDCG@K, alongside domain-specific performance (e.g., “controversial” queries, per-aspect relevance) (Deng et al., 28 Jun 2024, Li et al., 2023, Su et al., 2023).
Robustness and Ablations: Strong ablation evidence shows that knowledge-guided reformulation, contrastive element-aware learning, and multi-view or multi-factor objectives each contribute 1–5 points MAP or similar margins over flat baselines (Deng et al., 28 Jun 2024, Wang, 2022, Ma et al., 2023). Generative retrieval, when equipped with FM-index constraint and element-aware prompting, yields 6–20% improvements in P@5 on LEGAR BENCH and sustains accuracy with out-of-domain queries (Kim et al., 28 May 2025).
Interpretability: Sub-fact-level scoring, as in KELLER, allows transparent traceability: for each query fact, it is possible to inspect which specific document sub-fact matched, and the matrix of similarity scores can be visualized for audit or explanation purposes (Deng et al., 28 Jun 2024).

6. Domain Adaptation, Generalizability, and Limitations

Recent modules target cross-lingual, cross-jurisdictional, and cross-domain generalizability.

Pre-training for Domain Adaptation: Pre-training on large legal-specific corpora with language modeling, fact/provision matching, and judgment-level contrastive objectives allows for robust zero-shot transfer across legal systems (e.g., Chinese and English Caseformer; LegalSearchLM trained on sexual crimes but generalizes to traffic, embezzlement) (Su et al., 2023, Kim et al., 28 May 2025).
Modular Knowledge Integration: CaseLink and ReaKase-8B explicitly model semantic and charge graph connectivity, relation triplets, and inferential reasoning in the case embedding, further increasing domain transfer and discrimination (Tang et al., 26 Mar 2024, Tang et al., 30 Oct 2025).
Scaling and Efficiency: At large corpus scales (e.g., 1M+ cases in LEGAR BENCH/CLERC), systems employ approximate nearest neighbor libraries (e.g., FAISS HNSW), indexed passage-level retrieval (CLERC), and batch pre-encoding to preserve sub-second retrieval latencies (Deng et al., 28 Jun 2024, Yang, 4 Jul 2024, Hou et al., 24 Jun 2024).
Limits: Classic lexical models (BM25) remain competitive, particularly when domain-specific training is limited, and in high-overlap or noisy legal text scenarios. Challenges include encoding very long documents without information loss, robust handling of multi-aspect (characterization/penalty/procedure) relevance, and accurate modeling of charge long-tails and legal procedural divergences (Li et al., 2023, Wang, 2022).

7. Interpretability, Explainability, and Auditing

Interpretability is a central goal in case retrieval for legal and clinical settings.

Sub-fact-Level Traceability: In KELLER, each query sub-fact is mapped to its best-matching document sub-fact, and their similarity can be inspected and visualized (Deng et al., 28 Jun 2024).
Score Decomposition: Aggregation schemes such as MaxSim+Sum or weighted sum per component enable explicit accounting of which knowledge elements drive ranking, aiding legal justification and audit (Marom, 9 Jan 2025, Li et al., 2023).
Graph-Based Explanations: For systems based on semantic or legal connectivity graphs, retrieval rationales can be expressed as maximal subgraph overlaps, common paths, or high-confidence relation matches between query and candidate (Marchesin, 2018, Tang et al., 26 Mar 2024).
Explanatory Output Generation: In RAG-enabled settings, retrieved support cases are included directly in downstream generation prompts, allowing user-facing explanations to reference precedent text explicitly (Yang, 4 Jul 2024, Hou et al., 24 Jun 2024).

The technical maturation of the Case Retrieval Module reflects a convergence of advances in LLMs, domain knowledge integration, graph and element-level reasoning, scale-efficient vector search, and increasing demands for interpretability and auditability. Empirical work demonstrates that careful structuring of case features, contrastive and element-aware learning objectives, and knowledge-guided reformulation all yield robust gains over both traditional lexical and flat neural baselines, particularly in legally or medically complex settings.