Ranking with Abstention (2307.02035v1)
Abstract: We introduce a novel framework of ranking with abstention, where the learner can abstain from making prediction at some limited cost $c$. We present a extensive theoretical analysis of this framework including a series of $H$-consistency bounds for both the family of linear functions and that of neural networks with one hidden-layer. These theoretical guarantees are the state-of-the-art consistency guarantees in the literature, which are upper bounds on the target loss estimation error of a predictor in a hypothesis set $H$, expressed in terms of the surrogate loss estimation error of that predictor. We further argue that our proposed abstention methods are important when using common equicontinuous hypothesis sets in practice. We report the results of experiments illustrating the effectiveness of ranking with abstention.
- Shivani Agarwal. Surrogate regret bounds for bipartite ranking via strongly proper losses. The Journal of Machine Learning Research, 15(1):1653–1674, 2014.
- Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6(4), 2005.
- An efficient reduction of ranking to classification. In Conference on Learning Theory, 2008.
- Preference-based learning to rank. Machine Learning, 80(2-3):189–211, 2010.
- Calibration and consistency of adversarial surrogate losses. In Advances in Neural Information Processing Systems, 2021a.
- A finer calibration analysis for adversarial robustness. arXiv preprint arXiv:2105.01550, 2021b.
- H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, 2022a.
- Multi-class ℋℋ{\mathscr{H}}script_H-consistency bounds. In Advances in neural information processing systems, 2022b.
- DC-programming for neural network optimizations. Journal of Global Optimization, 2023a.
- Theoretically grounded loss functions and algorithms for adversarial robustness. In International Conference on Artificial Intelligence and Statistics, pages 10077–10094, 2023b.
- Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
- Learning scoring functions with order-preserving losses and standardized supervision. In International Conference on Machine Learning, pages 825–832, 2011.
- On the (non-) existence of convex, calibrated surrogate losses for ranking. In Advances in Neural Information Processing Systems, 2012.
- Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP), pages 39–57, 2017.
- Ranking and empirical minimization of U-statistics. The Annals of Statistics, 36(2):844–874, 2008.
- Learning to order things. Advances in neural information processing systems, 10, 1997.
- AUC optimization vs. error rate minimization. Advances in neural information processing systems, 16, 2003.
- Statistical analysis of bayes optimal subset ranking. IEEE Transactions on Information Theory, 54(11):5140–5154, 2008.
- On the consistency of ranking algorithms. In International conference on Machine learning, pages 327–334, 2010.
- An efficient boosting algorithm for combining preferences. Journal of machine learning research, 4(Nov):933–969, 2003.
- On the consistency of multi-label learning. In Conference on learning theory, pages 341–358, 2011.
- On the consistency of AUC pairwise optimization. In International Joint Conference on Artificial Intelligence, 2015.
- One-pass auc optimization. In International conference on machine learning, pages 906–914, 2013.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 1982.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133–142, 2002.
- Bipartite ranking through minimization of univariate loss. In International Conference on Machine Learning, pages 1113–1120, 2011.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Toronto University, 2009.
- Multi-class deep boosting. In Advances in Neural Information Processing Systems, pages 2501–2509, 2014.
- Statistical consistency of ranking methods in a rank-differentiable probability space. In Advances in Neural Information Processing Systems, 2012.
- Consistency versus realizable H-consistency for multiclass classification. In International Conference on Machine Learning, pages 801–809, 2013.
- SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- H-consistency bounds for pairwise misranking loss surrogates. In International conference on Machine learning, 2023a.
- Cross-entropy loss functions: Theoretical analysis and applications. arXiv preprint arXiv:2304.07288, 2023b.
- Bayes-optimal scorers for bipartite ranking. In Conference on Learning Theory, pages 68–106, 2014.
- Foundations of Machine Learning. MIT Press, second edition, 2018.
- Yurii E Nesterov. A method for solving the convex programming problem with convergence rate o(1/k2)𝑜1superscript𝑘2o(1/k^{2})italic_o ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Dokl. akad. nauk Sssr, 269:543–547, 1983.
- Classification calibration dimension for general multiclass losses. In Advances in Neural Information Processing Systems, 2012.
- Convex calibrated surrogates for low-rank loss matrices with applications to subset ranking losses. In Advances in Neural Information Processing Systems, 2013.
- On the consistency of output code based learning algorithms for multiclass learning problems. In Conference on Learning Theory, pages 885–902, 2014.
- On ndcg consistency of listwise ranking methods. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 618–626, 2011.
- Margin-based ranking meets boosting in the middle. In Conference on Learning Theory, pages 63–78, 2005.
- Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26(2):225–287, 2007.
- On the consistency of multiclass classification methods. Journal of Machine Learning Research, 8(36):1007–1025, 2007.
- Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
- On theoretically optimal ranking functions in bipartite ranking. Journal of the American Statistical Association, 112(519):1311–1322, 2017.
- Listwise approach to learning to rank: theory and algorithm. In International conference on Machine learning, pages 1192–1199, 2008.
- Bayes consistency vs. H-consistency: The interplay between surrogate loss functions and the scoring function class. In Advances in Neural Information Processing Systems, 2020.
- Convex calibrated surrogates for the multi-label f-measure. In International Conference on Machine Learning, pages 11246–11255, 2020.
- Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56–85, 2004.
- Revisiting discriminative vs. generative classifiers: Theory and implications. arXiv preprint arXiv:2302.02334, 2023.