Fast Differentiable Sorting and Ranking (2002.08871v2)

Published 20 Feb 2020 in stat.ML and cs.LG

Abstract: The sorting operation is one of the most commonly used building blocks in computer programming. In machine learning, it is often used for robust statistics. However, seen as a function, it is piecewise linear and as a result includes many kinks where it is non-differentiable. More problematic is the related ranking operator, often used for order statistics and ranking metrics. It is a piecewise constant function, meaning that its derivatives are null or undefined. While numerous works have proposed differentiable proxies to sorting and ranking, they do not achieve the $O(n \log n)$ time complexity one would expect from sorting and ranking operations. In this paper, we propose the first differentiable sorting and ranking operators with $O(n \log n)$ time and $O(n)$ space complexity. Our proposal in addition enjoys exact computation and differentiation. We achieve this feat by constructing differentiable operators as projections onto the permutahedron, the convex hull of permutations, and using a reduction to isotonic optimization. Empirically, we confirm that our approach is an order of magnitude faster than existing approaches and showcase two novel applications: differentiable Spearman's rank correlation coefficient and least trimmed squares.

Citations (204)

View on Semantic Scholar

Summary

The paper introduces fast, differentiable sorting and ranking operators using permutahedron projections and convex regularization to overcome non-differentiability in traditional methods.
It reformulates sorting and ranking as isotonic optimization problems, achieving O(n log n) runtime and O(n) space efficiency.
Empirical evaluations demonstrate superior performance in applications like top‑k classification and differentiable Spearman's rank correlation.

Fast Differentiable Sorting and Ranking

The paper "Fast Differentiable Sorting and Ranking" explores the challenge of improving the computational efficiency and differentiability of sorting and ranking operations, which are integral in machine learning, especially in applications related to robust statistics, order statistics, and ranking metrics. Traditional sorting and ranking operations, while straightforward in function, are characterized by non-differentiability (in the case of sorting) and piecewise constancy with undefined derivatives (for ranking), posing significant challenges when attempting to integrate them into differentiable programming frameworks widely used in modern deep learning architectures.

Key Contributions and Methodology

The authors propose differentiable sorting and ranking operators with computational complexities of $O(n \log n)$ for time and $O(n)$ for space, which aligns with the expected efficiencies of their non-differentiable counterparts. Central to their approach is the use of projections onto the permutahedron—the convex hull of permutations—as a means of softening these operations. Specifically, the paper develops differentiable approximations by adding strong convex regularization to the objective function of linear programs representing sorting and ranking, allowing for both exact computation and differentiation.

Key methodological enhancements include:

Permutahedron Projections: By framing sorting and ranking as linear programming problems over a permutahedron, the authors avail projections as a key computational vehicle. This approach is innovative compared to standard efforts focused directly on approximating ranks or metrics with differentiable proxies.
Strong Convex Regularization: Convex regularization strategies like quadratic and entropic regularization transform sorting and ranking into smoothly differentiable operations. This involves either quadratic Euclidean projections or Kullback-Leibler divergence projections onto specific mathematical constructs of the permutahedron.
Reduction to Isotonic Optimization: A clever reduction is performed allowing sorting and ranking operations to be expressed in terms of isotonic regression problems, solutions of which can efficiently utilize pool adjacent violators (PAV) algorithms, known for their computational efficiency.

Empirical Validation and Applications

Empirically, the authors validate their approach through comparisons with existing methods for tasks like top-k classification and label ranking, demonstrating significant improvements in both run-time efficiency and memory usage, especially in high-dimensional settings. The newly proposed operators outperform existing soft ranking approximations such as optimal transport (OT) methods and other pairwise distance-based approaches, particularly in terms of computational scalability and differentiability.

The paper presents two compelling applications of their differentiable operators:

Differentiable Spearman's Rank Correlation: Using the proposed operators to calculate a differentiable form of Spearman’s rank correlation coefficient, the paper shows improved performance in label ranking tasks.
Soft Least Trimmed Squares (LTS) Regression: By applying soft sorting to robust regression, the authors achieve a model that bridges the gap between least squares and least trimmed squares methodologies, offering adaptability depending on the extent of outliers present in the data.

Implications and Future Directions

This work significantly enhances the toolkit available to researchers working with robust, differentiable models where order statistics and ranking metrics are involved. The computational efficiency achieved makes it feasible to integrate sorting and ranking operations in high-dimensional neural networks without incurring prohibitive computational costs, thereby broadening the scope of differentiability and extending the landscape of applicable machine learning models.

Future work may focus on extending the scope of such differentiation techniques to encompass additional non-smooth operations encountered in other domains, further exploring the potential of novel convex regularization frameworks, or integrating these methodologies into more complex multi-task learning paradigms. Additionally, testing these approaches in real-world, large-scale applications can provide further insights into both their robustness and optimization capabilities under varying operational constraints.

In summary, this paper's innovations address fundamental challenges in making sorting and ranking operations efficiently differentiable, opening avenues for their seamless incorporation into modern machine learning infrastructures.

PDF Markdown

Related Papers

Tweets

https://twitter.com/mblondel_ml/status/1863612316245270724