NeuralNDCG: Differentiable NDCG Optimization

Updated 10 June 2026

NeuralNDCG is a set of methods that create differentiable relaxations of the sorting operation, enabling gradient-based optimization of the NDCG metric using techniques like NeuralSort and differentiable sorting networks.
It employs distinct loss formulations such as row-wise and transposed variants to align the training objective directly with ranking quality, yielding competitive performance on benchmarks like Web30K and Istella.
NeuralNDCG incorporates scalable stochastic and bilevel optimization strategies that extend its application to large-scale information retrieval, recommender systems, and preference alignment in language models.

NeuralNDCG denotes a class of differentiable surrogates and algorithms designed to directly optimize the Normalized Discounted Cumulative Gain (NDCG), a central ranking metric in information retrieval and recommendation systems. The core challenge addressed by these techniques is the non-differentiability of the sorting operator underlying NDCG, hindering direct end-to-end gradient-based learning of ranking models. NeuralNDCG encompasses differentiable relaxations of permutation-based metrics—primarily through soft sorting operators such as NeuralSort or differentiable sorting networks—and scalable stochastic optimization strategies that admit both theoretical convergence guarantees and strong empirical performance across learning-to-rank, recommender, and preference alignment tasks (Pobrotyn et al., 2021, Qiu et al., 2022, Zhou et al., 2024, Padalkar, 15 Apr 2026).

1. NDCG: Metric and Differentiability Challenges

NDCG computes the quality of a ranked list by combining item-level gains (e.g., $g(y) = 2^y - 1$ ) and position-based discounts (e.g., $d(j) = 1/\log_2(1 + j)$ ), normalized by the "ideal" DCG (IDCG) of the ground-truth ranking. For a list of length $n$ with predicted scores $\mathbf{s}$ and relevance labels $\mathbf{r}$ , sorting $\mathbf{s}$ produces permutation $\pi$ , and

$\mathrm{NDCG}@k = \frac{1}{\mathrm{IDCG}@k} \sum_{j=1}^k g(r_{\pi(j)}) d(j).$

The core difficulty for deep learning is that the mapping from $\mathbf{s}$ to $\pi$ is a discrete, piecewise-constant operation, yielding gradients that are zero almost everywhere and undefined at ties, which precludes direct gradient-based optimization (Pobrotyn et al., 2021, Qiu et al., 2022, Zhou et al., 2024).

2. Differentiable Relaxations: NeuralSort and Sorting Networks

To circumvent non-differentiability, NeuralNDCG leverages continuous relaxations of the sorting permutation:

NeuralSort constructs a unimodal, row-stochastic soft permutation matrix $d(j) = 1/\log_2(1 + j)$ 0 via per-row softmaxes over affine transformations of score differences. As temperature $d(j) = 1/\log_2(1 + j)$ 1, $d(j) = 1/\log_2(1 + j)$ 2 converges to the hard permutation; at higher $d(j) = 1/\log_2(1 + j)$ 3, it yields smoother gradients. This allows downstream NDCG computation to proceed using "soft-sorted" gains (Pobrotyn et al., 2021, Padalkar, 15 Apr 2026).
Differentiable Sorting Networks (e.g., odd-even networks (Zhou et al., 2024)) replace hard compare-and-swap with soft min/max operations parameterized by a steepness factor. Composition across sorting layers yields a doubly-stochastic permutation matrix $d(j) = 1/\log_2(1 + j)$ 4.

Both approaches enable the surrogate NDCG (e.g., NeuralNDCG, diffNDCG) to be fully differentiable with respect to the input score vector, supporting backpropagation through the entire ranking pipeline.

3. Loss Formulations and Training

NeuralNDCG Variants

Two principal formulations are prevalent:

Row-wise NeuralNDCG multiplies the soft permutation matrix $d(j) = 1/\log_2(1 + j)$ 5 with the vector of gains, so that $d(j) = 1/\log_2(1 + j)$ 6 is the expected gain at rank $d(j) = 1/\log_2(1 + j)$ 7. The surrogate metric is

$d(j) = 1/\log_2(1 + j)$ 8

Column-wise (transposed) NeuralNDCG sums over documents, applying the soft discounts from $d(j) = 1/\log_2(1 + j)$ 9 to each gain. Both formulations are equivalent up to matrix transposition and yield similar empirical results (Pobrotyn et al., 2021).

diffNDCG

The diffNDCG surrogate, as deployed in Direct Ranking Preference Optimization (DRPO), employs a differentiable sorting network to produce the permutation proxy $n$ 0, and the surrogate metric is

$n$ 1

The loss is simply the negative of this value (Zhou et al., 2024).

Optimization

All variants are trained via mini-batch stochastic gradient descent, with per-batch construction of the soft permutation and loss. Temperature parameters may be held fixed or annealed, though excessive sharpening (small $n$ 2) can introduce gradient instability. Large-scale systems typically employ Adam or similar optimizers; regularization via Sinkhorn normalization is sometimes used to maintain doubly-stochasticity (Pobrotyn et al., 2021, Padalkar, 15 Apr 2026).

4. Scalable Stochastic Optimization of NDCG Surrogates

An alternative class, exemplified by the SONG/K-SONG algorithms (Qiu et al., 2022), forgoes explicit sorting relaxations and instead formulates NDCG optimization as a compositional (and for top- $n$ 3, bilevel compositional) stochastic optimization problem:

The rank function $n$ 4 is approximated by averaging pairwise surrogates over item pairs in the candidate set, e.g.

$n$ 5

where $n$ 6 is a smooth pairwise loss.

For NDCG@K, a bilevel relaxation introduces a smooth top-K selector via a regularized inner optimization.
The optimization is performed via momentum-based stochastic methods (Adam, momentum SGD) over mini-batches, using moving-average estimates of inner surrogate terms ( $n$ 7, $n$ 8) and variance reduction (Qiu et al., 2022).

This approach results in per-iteration complexity that scales with mini-batch size, not list length, and enjoys provable convergence rates for non-convex deep models.

5. Applications and Empirical Performance

NeuralNDCG variants have seen broad adoption:

Information Retrieval and Learning to Rank: Across classic LTR benchmarks (Web30K, Istella), NeuralNDCG exceeds ApproxNDCG and is competitive with LambdaRank, e.g., achieving 51.56/53.46‰ NDCG@5/10 on Web30K and 70.68‰ NDCG@10 on Istella (Pobrotyn et al., 2021).
Recommendation: In large-scale temporal recommender systems, integrating neuralNDCG into urgency-aware Deep Interest Network (DIN) models led to a +9% lift in nDCG@1 over strong LightGBM baselines for daily fantasy sports applications (Padalkar, 15 Apr 2026).
Preference Alignment of LLMs: DRPO directly optimizes diffNDCG over ranked lists of responses, leading to a +5% absolute GPT-4 win rate gain over previous listwise methods and substantially improved reward-model agreement. Correlations between diffNDCG and reward-model win rate reach 0.95 (Zhou et al., 2024).

A summary table of core methods is shown below:

Method	Surrogate Construction	Differentiable Sorting	Complexity (per list)	Theoretical Convergence
NeuralNDCG	NeuralSort, soft permutation	Yes (NeuralSort)	$n$ 9	Consistency ( $\mathbf{s}$ 0)
diffNDCG (DRPO)	Differentiable Sorting Net	Yes (sorting net)	$\mathbf{s}$ 1	Consistency as $\mathbf{s}$ 2
SONG / K-SONG	Pairwise surrogate + bilevel	No explicit sorting	$\mathbf{s}$ 3	$\mathbf{s}$ 4 for non-convex (Qiu et al., 2022)

6. Practical Considerations, Extensions, and Limitations

Key practical factors include:

List length: Quadratic complexity in $\mathbf{s}$ 5 makes NeuralNDCG expensive for very long lists; batch length limiting or sampling is commonly required (Pobrotyn et al., 2021).
Temperature tuning: $\mathbf{s}$ 6 yields sharper approximations but unstable gradients. Grid search or gentle annealing is advised.
Extensions: The same formalism can extend to MAP, MRR, or other permutation-based metrics by substituting appropriate gain and discount definitions (Pobrotyn et al., 2021).
Distributed Training: Industrial-scale systems utilize multi-node distributed training (e.g., PyTorch DDP on Ray) for large data and model sizes (Padalkar, 15 Apr 2026).
Pitfalls: Sinkhorn scaling improves stability but induces additional overhead; absence of careful moving-average tracking can impair convergence or stability (Qiu et al., 2022).

7. Impact and Future Directions

NeuralNDCG and its variants have established a rigorous, effective paradigm for direct listwise metric optimization in deep models. By bridging the gap between non-differentiable evaluation criteria and end-to-end learning, these methods drive empirical gains in information retrieval, recommender systems, and preference alignment for LLMs. Future developments may improve the efficiency of sorting relaxations, construct surrogates for more complex ranking metrics, or generalize bilevel compositional frameworks to further settings. This suggests continued convergence between differentiable relaxations, scalable stochastic optimization, and direct metric-driven training in high-impact LTR and alignment tasks (Pobrotyn et al., 2021, Qiu et al., 2022, Zhou et al., 2024, Padalkar, 15 Apr 2026).