Paragraph Aggregation Retrieval Model (PARM)
- The paper introduces PARM, which overcomes dense retrieval limitations by decomposing documents into paragraphs and aggregating their signals using VRRF.
- PARM employs paragraph-level encoding and indexing to enable fine-grained evidence tracing and enhanced interpretability in document retrieval.
- Experimental results demonstrate PARM’s superior recall and robustness over traditional BM25 and document-level retrieval approaches, especially in legal settings.
A Paragraph Aggregation Retrieval Model (PARM) is a framework for document or passage retrieval that operates by decomposing documents into their constituent paragraphs and aggregating retrieval results at the paragraph level to yield robust document-level rankings. This approach specifically addresses the limitations of standard dense retrieval models, which are constrained by maximum input lengths and often lack interpretability at the subdocument level. PARM was devised to improve document-to-document retrieval in low-resource domains, notably legal case retrieval, where overall document relevance frequently depends on localized signals in specific paragraphs rather than global document content.
1. Motivation and Problem Definition
The motivation for PARM arises from the inadequacy of conventional Dense Passage Retrieval (DPR) models in document-to-document retrieval scenarios characterized by long texts and sparse supervision. Legal documents, for example, can contain 40 or more paragraphs, collectively far exceeding the input limits of BERT-style bi-encoders. In such cases, relevant content may be distributed across only a few paragraphs, making a single-vector approach insufficient and uninterpretable. PARM addresses:
- Input length limitations: Paragraph-level splitting overcomes the token cap of transformer architectures.
- Explainability: Relevance is linked to specific paragraphs, facilitating interpretability and evidence tracing.
- Limited supervision: Sparse annotation at either paragraph or document level can be leveraged effectively in training.
The paradigm thereby "liberates" dense retrieval methods from single-vector bottlenecks, permitting fine-grained matching and aggregation.
2. Architecture: Paragraph-Level Indexing and Aggregation Workflow
PARM’s workflow is structured as follows:
- Paragraph-level Encoding and Indexing: Each corpus document is split into paragraphs. Each paragraph is embedded into a dense vector using a DPR bi-encoder (e.g., BERT or LegalBERT), building a paragraph-level ANN index.
- Query Paragraph Retrieval: At inference, the query document is split into its paragraphs . For each query paragraph , ANN search is performed to retrieve the top- most similar corpus paragraphs.
- Paragraph-to-Document Mapping: Each retrieved paragraph is mapped to its parent document, yielding ranked candidate lists for each query paragraph.
- Aggregation to Document-Level Scores: Aggregation of scores (and vectors) across all query paragraph result lists yields a final document ranking for the query document. The aggregation method integrates signals from multiple matching paragraphs, supporting explainability and robustness.
This approach generalizes beyond legal retrieval and applies to any scenario requiring evidence from multiple subdocument units.
3. VRRF: Vector-Based Reciprocal Rank Fusion Aggregation
The central aggregation mechanism of PARM is Vector-based Reciprocal Rank Fusion (VRRF), which extends classical rank-based fusion to dense vector embeddings.
Aggregation formulas:
- Query embedding:
- Candidate document embedding:
where is the result list for and
is the rank of paragraph for query , and a fixed constant (typically 60).
- Final document score:
Interpretation: VRRF weights each paragraph vector by the reciprocal of its retrieval rank, privileging higher-ranking matches and summing evidence over all query paragraphs. This hybridizes rank-based robustness with dense semantic matching.
4. Comparative Performance and Baseline Analysis
Experiments on legal retrieval datasets (COLIEE, CaseLaw) reveal:
- VRRF consistently outperforms both rank-based and standard vector-based aggregation variants (CombSum, RRF, VSum, VMin, VAvg), especially at high recall cutoffs (R@500, R@1000).
- PARM with VRRF provides substantial gains over classic document-level retrieval:
- LegalBERT Doc FirstP R@100: 0.388
- PARM-VRRF R@100: 0.613
- At higher cutoffs, benefits are even greater (R@1k: 0.736 vs 0.913).
- BM25 with PARM-style aggregation (RRF) also beats document-level BM25 at higher thresholds, though VRRF enables further improvements for dense models.
Dense PARM methods retrieve a broader set of relevant documents compared to document-level approaches, making them valuable for recall-oriented first-stage retrieval.
| Model | Dataset | Doc-level | PARM-VRRF |
|---|---|---|---|
| R@100 (LegalBERT DPR, COLIEEDoc) | 0.388 | 0.613 | |
| R@1k (LegalBERT DPR, COLIEEDoc) | 0.736 | 0.913 | |
| # rel. docs retrieved (CaseLaw, DPR) | 199 | 545 |
5. Training Strategies and Data Annotation
PARM supports model training with either paragraph-level or document-level labels:
- Paragraph-level labels: Fewer but higher-quality annotations. Models trained solely on paragraph-level data yield higher performance for standard BERT-based bi-encoders.
- Document-level labels: Transformation into paragraph-level pairs via cross product produces noisy but more abundant training examples. LegalBERT benefits from additional document-level data at greater recall cutoffs, indicating law-specialized architectures generalize better under more data.
The model is robust to sparse annotation and is compatible with standard dense retrieval training regimes.
6. Lexical vs Dense Retrieval and Interpretability
Both lexical (BM25) and dense methods benefit from paragraph-level aggregation. However, dense models (DPR/LegalBERT + PARM + VRRF) ultimately retrieve more relevant documents at scale. Additionally:
- Heatmap analysis indicates that query paragraphs spatially align with candidate paragraphs, revealing topical correspondences and structural patterns (e.g., introductions to introductions, claims to claims).
- Explainability: PARM enables evidence tracing, showing which query and candidate paragraphs underlie a document's retrieval score.
This supports transparency in retrieval, which is critical for sensitive domains like legal case search.
7. Significance, Applications, and Future Directions
PARM constitutes a substantive advance for dense document retrieval in domains with long documents and limited supervision. Its main contributions can be summarized as:
- Ability to operate on long documents unconstrained by input length.
- Empirically superior aggregation strategy (VRRF) combining semantic and rank-based factors.
- Enhanced recall, benefiting subsequent neural re-ranking or answer extraction stages.
- Increased model interpretability for downstream users and auditing.
This suggests that adopting paragraph-level aggregation and weighted semantic fusion is generally advantageous for all document-to-document retrieval tasks, especially in low-resource environments and domains where granular evidence forms the basis of relevance.
Code for reproduction and deployment is available at [https://github.com/sophiaalthammer/parm].
In summary, Paragraph Aggregation Retrieval Models utilizing VRRF offer a principled, scalable, and interpretable solution for robust document retrieval, with demonstrated effectiveness in legal case applications and theoretical generalizability across domains.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free