Listwise Reranking Objectives

Updated 17 February 2026

Listwise reranking objectives are approaches in learning-to-rank that optimize entire candidate lists to directly enhance ranking metrics like NDCG and MRR.
They integrate classical techniques (e.g., ListNet, ListMLE) with modern methods (e.g., ALRO, FIRST) to leverage context-rich inputs for improved ranking performance.
Recent innovations address challenges in score calibration, permutation invariance, and scalability, achieving significant gains in both accuracy and efficiency.

Listwise reranking objectives constitute a core paradigm in learning-to-rank and modern information retrieval systems, particularly in applications involving context-rich candidate sets such as LLMs, document retrieval, recommendations, and reranking. Unlike pointwise or pairwise ranking schemes, listwise objectives exploit the ordering, dependencies, and structural information across entire candidate lists, aligning training signals more tightly with ranking-centric evaluation metrics like NDCG and MRR. This article surveys the fundamental variants, technical innovations, empirical advances, and system implications of state-of-the-art listwise reranking objectives.

1. Mathematical Foundations and Classical Listwise Losses

Classical listwise objectives optimize over candidate lists as wholes, leveraging the entire structure of predicted and ground-truth permutations or distributions. Canonical approaches include:

ListNet: Defines a Plackett-Luce (“top-1”) distribution over permutations, minimizing cross-entropy (CE) or KL between the predicted probability for each item and the ground-truth label distribution:

$L_{\mathrm{ListNet}}(\theta) = -\sum_{i=1}^{N} P^*(d_i) \log P_\theta(d_i)$

where $P^*(d_i) = \frac{\exp(\phi(y_i))}{\sum_j \exp(\phi(y_j))}$ is the target distribution and $P_\theta(d_i)$ is predicted via model logits (Chen et al., 2021).

ListMLE: Maximizes the likelihood of the observed permutation under a Plackett-Luce model:

$\mathcal{L}_{\mathrm{ListMLE}} = -\sum_{i} \log \frac{\exp(s_{\pi_i})}{\sum_{j \geq i} \exp(s_{\pi_j})}$

SoftRank/SoftmaxCE: Estimates a softened CE or NDCG metric by integrating over score perturbations.
LambdaMART: A tree-based ensemble with updates implicitly driven by NDCG gradient surrogates, typically inapplicable to direct next-token generative LLMs (Chao et al., 2024).

Classical listwise losses are typically invariant under score translation, making them effective for relative ranking but poorly calibrated for tasks requiring meaningful score magnitudes.

2. Listwise Objectives Aligned with LLMs and Next-Token Prediction

Traditional generative LLM reranking settings highlighted the mismatch between sequence-generation objectives (auto-regressive CE) and true ranking utility. Recent works introduce objectives that directly couple listwise ordering to generative logits:

Soft Lambda Loss (ALRO):
- Replaces hard argmax with soft-argmax over token logits, yielding differentiable ranking scores:
$s_i = \sum_{j=1}^m \operatorname{softmax}_j(\gamma y_{j,i}) \cdot j$ - Reranking loss:

$\mathcal{L}_{\text{rank}} = \sum_{i,j:\,\tau_j<\tau_i} \delta_{ij}\,|G_i-G_j|\,\log_2(1 + e^{-\sigma(s_i-s_j)})$

with $\delta_{ij}$ NDCG-derived weights, $G_i$ gains per NDCG, and $\sigma$ a slope. - Aligns the model’s token-generation probabilities to NDCG-driven pairwise error, providing target-aware gradient pressure at training time. Empirically, ALRO surpasses embedding-based and LLM ranking baselines, and model scaling consistently amplifies gains (Chao et al., 2024).
Permutation-Sensitive Learning: Explicit permutation-invariant consistency loss penalizes the LLM if its output ranking distribution changes when candidate order is permuted at training time:

$\mathcal{L}_{\text{cont}} = -\sum_{t=1}^{|y|} P_\theta(y_t | x, y_{<t}) \log P_\theta(y_t'| x', y_{<t}')$

This mitigates position bias in autoregressive models without inference-time penalty (Chao et al., 2024).

3. Efficient Listwise Reranking and First-Token Objectives

Efficiency is a central bottleneck in listwise LLM reranking. Innovations in output-space design and loss weighting have been pivotal:

FIRST (Single-Token Decoding and Weighted RankNet Loss):

The model receives a “listwise” prompt and produces, for each candidate, the logit assigned to its identifier at the first decoding step:

$s_i = \text{logit}_\theta(t_i|x)$ - Weighted RankNet loss over first-token logits:

$\mathcal{L}_{\text{Rank}} = \sum_{\substack{i,j=1 \ r_i<r_j}}^{m} \frac{1}{i+j} \log\left(1 + \exp(s_i - s_j)\right)$ - The $1/(i+j)$ weight prioritizes pairs among top-ranked candidates, focusing learning on critical distinctions. - The joint objective combines this with standard sequence-level LM loss:

$\mathcal{L}_{\text{Joint}} = \mathcal{L}_{LM} + \lambda \mathcal{L}_{\text{Rank}}$ - Inference reduces to a single forward step and score sort:

procedure FIRST_RERANK(query, candidates):
    x ← format(query, candidates)
    logits ← Model.forward(x)
    for each candidate i:
        s[i] ← logits[token_index(t_i)]
    end
    return sort_by_descending(s)

- FIRST empirically reduces latency by 21–50% relative to sequence-generation methods, with no loss (and sometimes improvement) of nDCG@10 versus ListNet, LambdaRank, or conventional LM objectives. The efficiency benefits extend to TREC Deep Learning tracks, with robust generalization across domains (Reddy et al., 2024, Chen et al., 2024).

4. Listwise Objectives for Embeddings, Calibration, and Multi-Objective Trade-offs

Regression-Compatible Ranking (RCR):
- Combines a pointwise sigmoid-CE (for regression calibration) with a new listwise-CE over normalized sigmoid scores:
$\mathcal{L}_{\text{RCR}}(\theta;q) = (1-\alpha)\,\sum_{i=1}^N \ell_{\rm Sigmoid}(s_i, y_i) + \alpha\,\ell_{\rm ListCE}\bigl(\sigma; s_{1:N}, y_{1:N}\bigr)$

where $\ell_{\rm ListCE}(\sigma)$ defines “ListNet with sigmoid scores”:

$p_i = \frac{\sigma(s_i)}{\sum_j \sigma(s_j)}$

and

$\ell_{\rm ListCE} = -\frac{1}{C}\sum_{i=1}^N y_i \log p_i$

The joint optimum satisfies calibration: $\sigma(s_i) \rightarrow \mathbb{E}[y_i|q,x_i]$ . On public LETOR and YouTube datasets, RCR strictly dominates the traditional multi-objective (sigmoid+softmax) in both ranking and calibration, with significant downstream improvements in CTR and AUCPR (Bai et al., 2022).
Listwise for Embeddings (E²Rank):
- Integrates a standard InfoNCE contrastive loss (for embedding quality) with a full listwise RankNet loss, using query-document cosine scores:
$\mathcal{L}_{\text{RankNet}} = \frac{1}{N}\sum_{i=1}^N \sum_{j,k: r_{i,j}<r_{i,k}} \log \left(1 + \exp\left(\frac{s(q_i, d_{i,k}) - s(q_i, d_{i,j})}{\tau_2}\right)\right)$ - The “listwise prompt” concatenates query and top- $K$ docs, then computes cosine similarity as the unified ranking score. This enables a single embedding model to excel at both retrieval and reranking, with empirical gains of 1–4 points nDCG@10 over conventional listwise LLM rerankers on BEIR and TREC, at ≈5× lower latency (Liu et al., 26 Oct 2025).

5. Advances in Listwise Encoding, Listwise Contrast, and Hierarchical Modeling

Listwise Encoding and Contrastive Losses (ListConRanker):
- Employs a bespoke ListTransformer that encodes the full candidate list jointly. Circle Loss is used to enhance training efficiency and adaptively prioritize “hard” positives and negatives:
$L = \log\left(1 + \sum_{j=1}^J e^{\gamma \alpha_j^- (s_j^- - A^-)} \cdot \sum_{i=1}^I e^{-\gamma \alpha_i^+ (s_i^+ - A^+)}\right)$ - Harder samples (low-scoring positives/high-scoring negatives) induce larger gradients. This improves convergence speed and end-to-end ranking, with ListConRanker + Circle Loss outperforming cross-entropy, CoSENT, and triplet loss on diverse reranking tasks (Liu et al., 13 Jan 2025).
ExpertRank (Mixture of Local Listwise Experts):
- Applies multi-level coarse-graining: non-relevant candidates are partitioned into overlapping or disjoint windows, and in each, the max/min scoring negatives are pooled into smaller expert sublists. Each expert is assigned a ListNet loss, with an outer gating network mixing these losses adaptively.
- This focus on “medium-hard” negatives improves ranking on MS MARCO and various neural architectures by 2–10% relative to ListNet/ListMLE, especially in low-data regimes (Chen et al., 2021).
Residual Listwise Preference Optimization (RLPO):
- For long-context settings, RLPO employs a strong pointwise LLM scorer, with global listwise corrections applied at the representation level via a lightweight multi-head self-attention block and NDCG-weighted pairwise logistic loss:
$\mathcal{L}_{\mathrm{RLPO}} = \sum_{i,j: y_i > y_j} \Delta_{ij} \log(1+\exp[-(s_i-s_j)])$

where $\Delta_{ij}$ is the normalized expected NDCG gain from swapping $i,j$ . RLPO achieves higher NDCG@10 and superior stability as list size increases, with only $O(N^2 d)$ overhead and negligible extra latency (Jiang et al., 12 Jan 2026).
Listwide Quality Objectives (RankFormer):
- Proposes an auxiliary BCE objective directly at the list level, predicting whether any item will be clicked/purchased, in parallel with a standard listwise ranking loss. This “absolute” listwide calibration prevents trivial solutions on all-zero lists and enhances practical utility and knowledge transfer for production systems (Buyl et al., 2023).

6. Position Bias, Self-Calibration, and Auxiliary Mechanisms

Permutation-Invariant and Consistency Training: Both ALRO (Chao et al., 2024) and SCaLR (Ren et al., 2024) employ specialized regularization:
- Permutation-Sensitive Learning (ALRO): Penalizes sensitivity to candidate order at training time.
- Self-Calibration Losses (SCaLR): Aggregates list-view and point-view losses, introducing a self-calibration term that aligns context-aware (listwise) scores with context-independent (pointwise) anchors. In-batch and adaptive calibration ensure robust scaling, and attention-masking plus parallel encoding guarantee that scores are comparable across sublists and invariant to permutations.
Position-Sensitive Context Modeling (RIA): Hierarchical transformer stacks and shared context-history modeling (LMH, CUHT) facilitate deep cross-item dependency modeling. RIA unites standard listwise binary CE with deep structural modeling, decoupling ranking and reranking heads but fusing knowledge via shared representations and embedding caching (Zhang et al., 26 Nov 2025).

7. Empirical Impact, Limitations, and Practical Considerations

Listwise reranking objectives have advanced both accuracy and efficiency in retrieval and recommendation. Summary findings from recent benchmarks:

Objective/Method	Rerank Metric (Sample)	Efficiency/Notes	Reference
ALRO (Soft Lambda, PSL)	NDCG@10: 0.712	Zero-shot/few-shot LLM, no latency hit	(Chao et al., 2024)
FIRST (Single-token loss)	nDCG@10: up to 0.756	21–50% latency reduction	(Reddy et al., 2024)
RCR (calibrated, regression)	NDCG@10: 0.468	Best ranking/calibration trade-off	(Bai et al., 2022)
ListConRanker (Circle Loss)	mAP: 73.25% (MTEB CHT)	Smoother, faster convergence	(Liu et al., 13 Jan 2025)
RLPO (hybrid, long-context)	NDCG@50: 0.791	Scalable to $N>50$ , <2s latency	(Jiang et al., 12 Jan 2026)
E²Rank (listwise pairs/ranknet)	nDCG@10: 54.35 (BEIR)	≈5× faster than LLM-based rerankers	(Liu et al., 26 Oct 2025)

Ablation studies and systematic analyses across these works show:

Prioritizing head-of-list accuracy (e.g., via RankNet weights, NDCG- $\lambda$ gains, or explicit top- $k$ focusing) yields significantly better ranking outcomes than uniform penalties.
Listwise calibration, via auxiliary heads or multi-objective trade-offs, is essential for integration into real-world pipelines, especially when ranking scores must be used for business-critical calibrations (e.g., in ad CTR prediction).
Efficient listwise reranking (FIRST, E²Rank, RLPO) is now practical for large-scale and long-context settings.

Notable limitations include increased implementation complexity, the need for substantial ground-truth permutation labeling, and, in some methods (e.g., RLPO), dependence on an already strong base scorer for the residual correction to be effective.

References

ALRO: Make LLM a Better Ranker (Chao et al., 2024)
FIRST: Faster Improved Listwise Reranking with Single Token Decoding (Reddy et al., 2024), Early FIRST Reproduction (Chen et al., 2024)
ExpertRank: A Multi-level Coarse-grained Expert-based Listwise Ranking Loss (Chen et al., 2021)
SCaLR: Self-Calibrated Listwise Reranking with LLMs (Ren et al., 2024)
RCR: Regression Compatible Listwise Objectives for Calibrated Ranking with Binary Relevance (Bai et al., 2022)
ListConRanker: A Contrastive Text Reranker with Listwise Encoding (Liu et al., 13 Jan 2025)
RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking (Jiang et al., 12 Jan 2026)
RankFormer: Listwise Learning-to-Rank Using Listwide Labels (Buyl et al., 2023)
RIA: A Ranking-Infused Approach for Optimized listwise CTR Prediction (Zhang et al., 26 Nov 2025)
E²Rank: Efficient Embedding-based Ranking (Liu et al., 26 Oct 2025)