Which Tricks Are Important for Learning to Rank? (2204.01500v2)
Abstract: Nowadays, state-of-the-art learning-to-rank methods are based on gradient-boosted decision trees (GBDT). The most well-known algorithm is LambdaMART which was proposed more than a decade ago. Recently, several other GBDT-based ranking algorithms were proposed. In this paper, we thoroughly analyze these methods in a unified setup. In particular, we address the following questions. Is direct optimization of a smoothed ranking loss preferable over optimizing a convex surrogate? How to properly construct and smooth surrogate ranking losses? To address these questions, we compare LambdaMART with YetiRank and StochasticRank methods and their modifications. We also propose a simple improvement of the YetiRank approach that allows for optimizing specific ranking loss functions. As a result, we gain insights into learning-to-rank techniques and obtain a new state-of-the-art algorithm.
- BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. In Advances in Neural Information Processing Systems 33, 2020.
- A stochastic treatment of learning to rank scoring functions. In Proceedings of the 13th ACM International Conference on Web Search and Data Mining, pp. 61–69, 2020.
- Learning to rank with nonsmooth cost functions. Proceedings of the Advances in Neural Information Processing Systems, 19:193–200, 2007.
- Burges, C. J. C. From RankNet to LambdaRank to LambdaMART: An overview. Technical report, Microsoft Research, 2010.
- CatBoost. Ranking: objectives and metrics. https://catboost.ai/docs/concepts/loss-functions-ranking.html, 2023.
- Yahoo! learning to rank challenge overview. In Proceedings of the learning to rank challenge, pp. 1–24, 2011.
- Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM Transactions on Information Systems (TOIS), 35(2):1–31, 2016.
- Revisiting deep learning models for tabular data. In Advances in Neural Information Processing Systems 34, 2021.
- Winning the transfer learning track of Yahoo!’s learning to rank challenge with YetiRank. In Proceedings of the Learning to Rank Challenge, pp. 63–76, 2011.
- On optimizing top-k metrics for neural ranking models. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2303–2307, 2022.
- Net-DNF: Effective deep modeling of tabular data. In International Conference on Learning Representations, 2021.
- LightGBM: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems, pp. 3146–3154, 2017.
- Liu, T.-Y. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009.
- Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017.
- CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems, pp. 6638–6648, 2018.
- Introducing LETOR 4.0 datasets. CoRR, abs/1306.2597, 2013.
- Are neural rankers still outperformed by gradient boosted decision trees? In International Conference on Learning Representations, 2021.
- StochasticRank: Global optimization of scale-free discrete functions. In International Conference on Machine Learning, pp. 9669–9679, 2020.
- SGLB: Stochastic Gradient Langevin Boosting. In International Conference on Machine Learning, pp. 10487–10496, 2021.
- The LambdaLoss framework for ranking metric optimization. In Proceedings of The 27th ACM International Conference on Information and Knowledge Management, pp. 1313–1322, 2018.
- Adapting boosting for information retrieval measures. Information Retrieval, 13(3):254–270, 2010.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.