LapSum-Based Soft Top-K

Updated 16 March 2026

LapSum-based Soft Top-K is a differentiable approximation method that smoothens the hard top-k operator using the Laplace CDF to produce soft weights.
It employs closed-form inversion and log-space computations to ensure efficient, stable gradient propagation and robust performance in classification tasks.
Empirical results show that this method improves accuracy and convergence in scenarios with noisy labels and limited data compared to traditional losses.

LapSum-based Soft Top-K refers to a class of smooth, differentiable relaxations of the top- $k$ selection operator, constructed via summing the cumulative distribution functions (CDFs) of Laplace distributions and inverting this sum to obtain ^{^{^{^{1^{^{^{^}}}}}}} top- $k$ weights or losses. The LapSum method underlies both differentiable top- $k$ selection functions and smoothed loss functions optimized for top- $k$ metrics in machine learning. These relaxations address the challenge that hard top- $k$ operators and their associated losses are non-differentiable and provide poor gradient signals for optimization with stochastic gradient descent, particularly in deep learning contexts. Recent developments have established two main LapSum-based frameworks: the Soft Top-K SVM loss for classification tasks and the more general LapSum-based soft ranking, selection, and permutation operators.

1. Mathematical Formulation and Definition

The LapSum function is defined using the Laplace CDF as follows. Let $r = (r_0, \ldots, r_{n-1}) \in \mathbb{R}^n$ represent the centers (e.g., scores), and $\alpha \ne 0$ be the temperature or smoothing parameter. The scaled CDF of the standard Laplace distribution is

$\mathrm{Lap}_\alpha(x) = \begin{cases} \frac12 \exp(x / \alpha), & x \le 0\ 1 - \frac12 \exp(-x / \alpha), & x > 0. \end{cases}$

The LapSum function is then

$\mathrm{LapSum}_\alpha(x; r) = \sum_{i=0}^{n-1} \mathrm{Lap}_\alpha(x - r_i).$

For a given $w \in (0, n)$ , the LapSum-based soft top- $w$ operator is defined by inverting the above sum to find $b$ such that $\mathrm{LapSum}_\alpha(b; r) = w$ , and then setting

$p_i = \mathrm{Lap}_\alpha(b - r_i), \qquad \sum_i p_i = w.$

As $\alpha \to 0^+$ and $w=k\in\mathbb{N}$ , the vector $(p_i)$ converges to the hard top- $k$ indicator. This construction provides a smooth, differentiable and parameterizable approximation of the hard top- $k$ .

In classification loss settings (Berrada et al., 2018), the LapSum formalism is used to define a smooth surrogate loss for top- $k$ error based on log-sum-exp over $k$ -tuples, yielding the “Smooth Top-K SVM” (or LapSum-based Soft Top- $k$ loss):

$L_{k,\tau}(s, y) = \tau \log \left[ \sum_{\bar{y}\in Y^{(k)}} \exp\left(\frac{\Delta_k(\bar{y}, y)+\frac{1}{k}\sum_{j\in\bar{y}} s_j}{\tau}\right) \right] - \tau \log \left[ \sum_{\bar{y}\in Y^{(k)}_y} \exp\left(\frac{\frac{1}{k}\sum_{j\in\bar{y}} s_j}{\tau}\right) \right].$

Here, $s \in \mathbb{R}^n$ is the model score vector, $y$ is the ground-truth class, and $Y^{(k)}$ , $Y^{(k)}_y$ represent the set of $k$ -tuples (possibly including or excluding $y$ ).

2. Smoothing via Log-Sum-Exp and Laplace CDF

Non-differentiability of the hard top- $k$ operator arises from the use of $\max$ and $k$ -selection, yielding piecewise linearities and highly sparse subgradients not suitable for deep network training. By replacing the $\max$ over $k$ -tuples with the softmax or log-sum-exp (temperature $\tau$ or $\alpha$ as a smoothness parameter), the LapSum construction produces a smooth approximation in which nearly highest scores contribute according to their ranking. This yields dense gradient information, aiding convergence and robustness in stochastic optimization (Berrada et al., 2018).

In the LapSum-based soft selection setting, the smoothness is controlled by $\alpha$ . Small $\alpha$ yields a near-hard selection, while large $\alpha$ leads to highly distributed, smooth soft top- $k$ probabilities. A direct benefit is the ability to interpolate between the hard operator and a fully smooth, ranking-weighted output.

3. Efficient Algorithmic Implementation

3.1 Closed-Form Inversion and Piecewise Structure

To compute $b$ such that $\mathrm{LapSum}_\alpha(b; r)=w$ , sort $r$ and precompute auxiliary sequences $a_k, b_k, c_k$ in $O(n)$ time. On each interval $[\tilde r_k,\tilde r_{k+1}]$ , $\mathrm{LapSum}_\alpha(x; r)$ admits a closed-form representation, and $b$ is obtainable using explicit formulas for boundary and interior segments. The interval $k$ containing $w$ is found by binary search, so total complexity for the inversion is $O(n\log n)$ (Struski et al., 8 Mar 2025).

3.2 Forward and Backward Algorithms

Given $b$ , the soft top- $k$ weights $p_i$ are evaluated for all $i$ in $O(n)$ . Gradients with respect to $r$ , $w$ , and $\alpha$ are obtained by defining a density vector $s_i$ and normalization $S = \sum_i s_i$ , yielding

$\frac{\partial p}{\partial w} = q, \quad \frac{\partial p}{\partial r} = s q^T - \mathrm{diag}(s)$

where $q_i=s_i/S$ . Vector-Jacobian products can be evaluated in $O(n)$ time without explicit Jacobian formation (Struski et al., 8 Mar 2025).

For classification loss with polynomial-algebraic structure (elementary symmetric polynomials), the key quantities $\sigma_k$ and $\sigma_{k-1}$ can be computed via a divide-and-conquer, degree-truncated polynomial product to compute the relevant symmetric sums in $O(kn)$ time (Berrada et al., 2018). The backward pass uses recursions for partial symmetric sums, also in $O(kn)$ . This enables efficient computation of loss and gradient for large $n$ and moderate $k$ .

3.3 Numerical Stability

Forward computation is implemented in log-space to prevent overflow, with log-add-exp tricks for summation. Backward recursions are stabilized when $e_i$ becomes large using a $p$ -term asymptotic expansion, leading to stable gradients in single-precision arithmetic with only minor computational overhead (Berrada et al., 2018).

4. Empirical Performance and Comparisons

The LapSum-based Soft Top- $k$ demonstrates advantages under various regimes, particularly for $k=5$ :

On CIFAR-100 with ResNet-18 and noisy labels, top-5 Soft Top- $k$ SVM loss (LapSum with $\tau=1$ ) achieves higher robustness than cross-entropy: at $p=0.8$ label noise, Soft Top-5 SVM attains top-5 accuracy $\approx 79.3\%$ vs cross-entropy $74.8\%$ , and top-1 accuracy $\approx 55.9\%$ vs cross-entropy $35.5\%$ (Berrada et al., 2018).
For ImageNet in low-data settings (5–25% samples), LapSum soft top-5 loss slightly outperforms cross-entropy; gaps close as data increases, aligning with theory that cross-entropy is asymptotically optimal (Berrada et al., 2018).
In large-scale differentiable sorting and ranking, LapSum soft top- $k$ achieves top-5 accuracy rates on CIFAR-100 (ResNet-18) and ImageNet-1K (ResNeXt-101) that match or surpass NeuralSort, SoftSort, SinkhornSort, and OT-based approaches, with lower or comparable runtime and memory requirements, especially as $n$ and $k$ grow. On ImageNet-21K-P, LapSum soft top-5 achieves ACC@5 $\approx 70.7\%$ (Struski et al., 8 Mar 2025).

Runtime for forward+backward is $O(n\log n)$ , outperforming alternate schemes for large $n$ .

5. Practical Implementation Considerations

Efficient LapSum-based soft top- $k$ solutions are available in vectorized Python/PyTorch as well as in custom CUDA kernels. CPU algorithms exploit prefix scans and binary search for breakpoints, while CUDA implementations use warp-parallel prefix sums for evaluation at scale (Struski et al., 8 Mar 2025). Double precision is standard, but float32 offers similar accuracy after stabilizing exponentials.

The primary hyperparameter is $\alpha$ (or $\tau$ in the loss), with smaller values approximating hard selection and larger values offering smoother distributions; typically, $\alpha$ is tuned through grid search or end-to-end learning.

For extremely large $n$ , sorting can dominate computational cost, suggesting partial sorts or segment-tree approximations for streaming or online scenarios. Numerical stability at breakpoints is maintained by numerically stable square-root formulas and exponent clamping (Struski et al., 8 Mar 2025). Extensions to fractional $k$ ("top- $w$ " for real $w$ ) are immediate, generalizing the selection operator.

6. Limitations and Distinctive Properties

LapSum-based Soft Top- $k$ methods are subject to the following constraints:

The scale parameter $\alpha$ cannot be zero. The approximation quality between hard and soft top- $k$ is governed by $\alpha$ ; incorrect tuning may affect performance or gradient informativeness.
For full Jacobian computation, memory requirements are $O(n^2)$ , although vector-Jacobian products for backpropagation only require $O(n)$ (Struski et al., 8 Mar 2025).
The sort step is the computational bottleneck for extremely large $n$ , with plausible mitigations in streaming or coarse ranking settings.
The LapSum formalism naturally extends to "soft" relaxations for ranking, permutation, and sorting operators, maintaining differentiability, monotonicity, and computational tractability.
Mathematical convergence to hard top- $k$ is pointwise as $\alpha\to0^+$ for integer $w=k$ .

7. Relation to Other Differentiable Top-K and Ranking Methods

LapSum-based soft top- $k$ distinguishes itself from alternatives such as NeuralSort, SoftSort, SinkhornSort, and optimal transport-based ranking operators by offering both closed-form inversion and explicit construction for probabilities that preserve rank structure with a direct probabilistic interpretation. Empirical and runtime comparison confirm LapSum is in the top efficiency and accuracy cluster for high-dimensional and large- $k$ tasks (Struski et al., 8 Mar 2025).

References:

"Smooth Loss Functions for Deep Top-k Classification" (Berrada et al., 2018)
"LapSum -- One Method to Differentiate Them All: Ranking, Sorting and Top-k Selection" (Struski et al., 8 Mar 2025)

Markdown Report Issue Upgrade to Chat

References (2)

Smooth Loss Functions for Deep Top-k Classification (2018)

LapSum -- One Method to Differentiate Them All: Ranking, Sorting and Top-k Selection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LapSum-based Soft Top-K.

LapSum-Based Soft Top-K

1. Mathematical Formulation and Definition

2. Smoothing via Log-Sum-Exp and Laplace CDF

3. Efficient Algorithmic Implementation

3.1 Closed-Form Inversion and Piecewise Structure

3.2 Forward and Backward Algorithms

3.3 Numerical Stability

4. Empirical Performance and Comparisons

5. Practical Implementation Considerations

6. Limitations and Distinctive Properties

7. Relation to Other Differentiable Top-K and Ranking Methods

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

LapSum-Based Soft Top-K

1. Mathematical Formulation and Definition

2. Smoothing via Log-Sum-Exp and Laplace CDF

3. Efficient Algorithmic Implementation

3.1 Closed-Form Inversion and Piecewise Structure

3.2 Forward and Backward Algorithms

3.3 Numerical Stability

4. Empirical Performance and Comparisons

5. Practical Implementation Considerations

6. Limitations and Distinctive Properties

7. Relation to Other Differentiable Top-K and Ranking Methods

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research