Dynamic Ranked List Truncation for Reranking Pipelines via LLM-generated Reference-Documents

Published 10 Apr 2026 in cs.IR | (2604.09492v1)

Abstract: LLMs (LLM) have been widely used in reranking. Computational overhead and large context lengths remain a challenging issue for LLM rerankers. Efficient reranking usually involves selecting a subset of the ranked list from the first stage, known as ranked list truncation (RLT). The truncated list is processed further by a reranker. For LLM rerankers, the ranked list is often partitioned and processed sequentially in batches to reduce the context length. Both these steps involve hyperparameters and topic-agnostic heuristics. Recently, LLMs have been shown to be effective for relevance judgment. Equivalently, we propose that LLMs can be used to generate reference documents that can act as a pivot between relevant and non-relevant documents in a ranked list. We propose methods to use these generated reference documents for RLT as well as for efficient listwise reranking. While reranking, we process the ranked list in either parallel batches of non-overlapping windows or overlapping windows with adaptive strides, improving the existing fixed stride setup. The generated reference documents are also shown to improve existing efficient listwise reranking frameworks. Experiments on TREC Deep Learning benchmarks show that our approach outperforms existing RLT-based approaches. In-domain and out-of-domain benchmarks demonstrate that our proposed methods accelerate LLM-based listwise reranking by up to 66\% compared to existing approaches. This work not only establishes a practical paradigm for efficient LLM-based reranking but also provides insight into the capability of LLMs to generate semantically controlled documents using relevance signals.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces an innovative dynamic truncation method that uses LLM-generated reference pivot documents to adaptively select candidate documents for reranking.
It leverages PSI-Rank variants and listwise reranker innovations to reduce inference cost by up to 66% while maintaining state-of-the-art MAP and nDCG performance.
The approach is model-agnostic, scalable, and demonstrates robust performance across diverse datasets and retrieval systems.

Dynamic Ranked List Truncation for LLM-based Reranking Using Reference-Document Pivots

Introduction

This paper introduces a framework for dynamic ranked list truncation (RLT) and efficient LLM-based reranking pipelines via synthetic, LLM-generated reference documents acting as semantic pivots. The work addresses major bottlenecks in modern multi-stage retrieval systems, particularly the efficiency constraints and context window limitations imposed by LLM-based rerankers. Conventional truncation methods rely on fixed cut-offs or global heuristics, which disregard query-specific relevance distributions and often result in suboptimal reranking effectiveness and efficiency. The authors propose leveraging LLMs to generate reference documents with controlled relevance levels, then exploiting these pivots for query-adaptive RLT and semantically informed listwise reranking strategies.

LLM-generated Reference-Document Pivots

Central to the approach is the generation of a moderately relevant reference document, $D^*$ , for each query $Q$ (Figure 1). An LLM, prompted with an explicit relevance grade (e.g., "marginally relevant" on a 0–3 scale), generates a pivot document positioned at the relevance threshold separating positive and negative examples according to TREC scales. This reference serves as a semantic anchor in the ranked list: documents ranked above $D^*$ are prioritized for reranking, whereas those below are deprioritized or omitted, operationalizing a dynamic, content-sensitive cutoff.

Figure 1: Workflow for generating a reference-document pivot $D^*$ for a query, used as a semantic anchor for reranking pipelines.

The generative prompt is carefully engineered to specify the information need and the required degree of relevance. The resulting $D^*$ is automatically validated using a strong LLM-based relevance estimator (UMBRELA), ensuring calibration at the targeted relevance threshold.

Dynamic RLT via PSI-Rank

The proposed PSI-Rank framework employs $D^*$ for efficient, query-adaptive truncation in reranking pipelines. Two variants are considered:

Dynamic Cutoff (PSI-Rank $_{Dyn}$ ): For each query, the first-stage retriever scores all candidates and $D^*$ . All documents with scores exceeding that of $D^*$ constitute the truncated list for reranking.
Average Cutoff (PSI-Rank $_{Avg}$ ): A calibration set is used to estimate the average pivot score, yielding a collection-wide static threshold that is less sensitive to outliers but sacrifices per-query adaptivity.

This approach is retriever- and reranker-agnostic, obviates the need for expensive training or hyperparameter tuning, and, critically, requires only a single additional inference per query for the pivot.

Pivot-guided Listwise Reranking

The authors extend the paradigm to listwise rerankers, which process batches/windows of candidate documents to accommodate LLM context restrictions. Three architectural innovations are introduced:

SNOW (Shared Non-Overlapping Window): Non-overlapping windows, each augmented with $Q$ 0, facilitate parallel listwise reranking without the loss of cross-window document promotion. The shared pivot enables consistent relevance boundaries across windows and reduces latency.
VS-Sliding (Variable Stride-Sliding): The stride between windows is dynamically adapted based on local relevance density signaled by the pivot, increasing stride in sparse regions and reducing it where relevant documents concentrate.
GPTD-Part: This method adapts Top-Down Partitioning (TD-Part) by replacing the retrieval-based pivot with the LLM-generated $Q$ 1, increasing reranking stability and robustness to noisy initial retrievals.
Figure 2: Distribution of dynamically determined cutoffs by PSI-Rank across datasets, consistently truncating to much smaller candidate sets per query, thus minimizing LLM reranker calls.

Experimental Evaluation

Comprehensive experiments are conducted on MS MARCO, TREC DL, and BEIR benchmarks, evaluating both in-domain and out-of-domain generalization. RLT and reranking pipelines are compared to recent supervised and unsupervised baselines (BiCut, AttnCut, Choppy, TD-Part) under multiple retrievers (BM25, SPLADE) and rerankers (mono-T5, duo-T5, RankZephyr, RankVicuna, RankGPT).

Key findings include:

Adaptive Truncation Efficiency: PSI-Rank methods consistently achieve strong MAP and nDCG at up to 66% reduction in reranker inference cost (inferences per query), with little or no effectiveness degradation compared to fixed or learned cutoffs, and often surpassing them (Figure 2).
Parallel and Adaptive Listwise Reranking: VS-Sliding and SNOW accelerate listwise reranking up to 2–3x and match or outperform classical sliding windows and partitioning baselines in nDCG. SNOW demonstrates consistently high efficiency, while VS-Sliding maximizes the effectiveness-efficiency frontier.
Robustness and Generalizability: Results with different LLMs for both pivot generation and reranking (Llama-3.2-3B, Llama-3.1-8B, GPT-4o) demonstrate that the effectiveness of the approach is stable across a wide range of model sizes and architectures. The variance in nDCG and MAP is negligible between pivots generated by small and large LLMs (Figure 3).
Figure 3: nDCG@10 scores showing that reference-document generation is robust to LLM choice and hyperparameters, with consistently stable reranking performance.

Theoretical and Practical Implications

Semantically controlled, synthetic pivot documents enable new types of query- and context-sensitive adaptive truncation for both classic and LLM-based retrieval. This approach unifies and simplifies reranking pipeline design by allowing truncation and window partitioning decisions to be directly informed by generative models, rather than error-prone or inflexible heuristics. The minimal dependence on model size for pivot quality highlights the accessibility and scalability of this method in practical deployments.

On the theoretical side, this work empirically demonstrates that LLMs can generate documents which serve as effective, human-equivalent semantic boundaries—enabling downstream models to “anchor” their relevance judgments. This introduces a new controllable dimension to classic retrieval and modern listwise ranking algorithms.

Future Directions

Potential avenues for extension include joint optimization frameworks integrating generative pivot production with retrieval and reranking models, pivot-based hard negative mining for contrastive training, and adaptation to multi-modal or multi-lingual retrieval scenarios. The canonical LLM-based pivot could enable the automation of label smoothing, outlier rejection, and adaptive threshold calibration in a variety of ranking-centric NLP applications.

Conclusion

The methodology presented establishes LLM-generated semantic pivots as reliable, efficient, and model-agnostic anchors for dynamic ranked list truncation and listwise reranking. The approach achieves state-of-the-art effectiveness-efficiency trade-offs in multi-stage retrieval pipelines and demonstrates strong transferability across datasets, domains, retrievers, rerankers, and LLM architectures. This paradigm is immediately beneficial for practitioners seeking to deploy LLM rerankers at scale and opens new research avenues in adaptive retrieval and generative pipelines.

Markdown Report Issue