Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search Dataset (2404.02543v3)

Published 3 Apr 2024 in cs.IR and cs.AI

Abstract: Unbiased learning-to-rank (ULTR) is a well-established framework for learning from user clicks, which are often biased by the ranker collecting the data. While theoretically justified and extensively tested in simulation, ULTR techniques lack empirical validation, especially on modern search engines. The Baidu-ULTR dataset released for the WSDM Cup 2023, collected from Baidu's search engine, offers a rare opportunity to assess the real-world performance of prominent ULTR techniques. Despite multiple submissions during the WSDM Cup 2023 and the subsequent NTCIR ULTRE-2 task, it remains unclear whether the observed improvements stem from applying ULTR or other learning techniques. In this work, we revisit and extend the available experiments on the Baidu-ULTR dataset. We find that standard unbiased learning-to-rank techniques robustly improve click predictions but struggle to consistently improve ranking performance, especially considering the stark differences obtained by choice of ranking loss and query-document features. Our experiments reveal that gains in click prediction do not necessarily translate to enhanced ranking performance on expert relevance annotations, implying that conclusions strongly depend on how success is measured in this benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Addressing Trust Bias for Unbiased Learning-to-Rank. In The World Wide Web Conference (WWW).
  2. Estimating Position Bias without Intrusive Interventions. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM).
  3. Unbiased Learning to Rank with Unbiased Propensity Estimation. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  4. Unbiased Learning to Rank: Online or Offline? ACM Transactions on Information Systems (TOIS) 39, 2 (2021).
  5. Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data. In Machine Learning and Knowledge Discovery in Databases: European Conference (ECML PKDD).
  6. A Neural Click Model for Web Search. In Proceedings of the 25th International Conference on World Wide Web (WWW).
  7. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
  8. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR).
  9. Learning to Rank with Nonsmooth Cost Functions. In Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS).
  10. Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research (JMLR) 14 (2011), 1–24.
  11. Olivier Chapelle and Ya Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. In The World Wide Web Conference (WWW).
  12. THUIR at WSDM Cup 2023 Task 1: Unbiased Learning to Rank. In Proceedings of The Sixteen ACM International Conference on Web Search and Data Mining (WSDM).
  13. A Context-Aware Click Model for Web Search. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM).
  14. Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank. arXiv:2309.15560 [cs.IR]
  15. Stanley F. Chen and Joshua Goodman. 1999. An Empirical Study of Smoothing Techniques for Language Modeling. Computer Speech & Language 13, 4 (1999), 359–394.
  16. Multi-Feature Integration for Perception-Dependent Examination-Bias Estimation. In Proceedings of The Sixteen ACM International Conference on Web Search and Data Mining (WSDM).
  17. Click Models for Web Search. Morgan & Claypool. https://doi.org/10.2200/S00654ED1V01Y201507ICR043
  18. An Experimental Comparison of Click Position-bias Models. In Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM).
  19. Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Transactions on Information Systems (TOIS) 35, 2, Article 15 (2016).
  20. The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation. In The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  21. An Offline Metric for the Debiasedness of Click Models. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  22. Evaluating the Robustness of Click Models to Policy Distributional Shift. ACM Transactions on Information Systems (TOIS) 41, 4, Article 84 (2023).
  23. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1 (1977), 1–38.
  24. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]
  25. Georges E. Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations.. In International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  26. Deep Learning Tuning Playbook. http://github.com/google-research/tuning_playbook Version 1.0.
  27. PAL: A Position-bias Aware Learning Framework for CTR Prediction in Live Recommender Systems. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys).
  28. Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  29. Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm. In The World Wide Web Conference (WWW).
  30. Rax: Composable Learning-to-Rank Using JAX. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).
  31. Accurately Interpreting Clickthrough Data as Implicit Feedback. In International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  32. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM).
  33. Towards Better Web Search Performance: Pre-training, Fine-tuning and Learning to Rank. In Proceedings of The Sixteen ACM International Conference on Web Search and Data Mining (WSDM).
  34. Pretraining De-Biased Language Model with Large-scale Click Logs for Document Ranking. In Proceedings of The Sixteen ACM International Conference on Web Search and Data Mining (WSDM).
  35. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. (2019). arXiv:1711.05101 [cs.LG]
  36. Overview of the NTCIR-17 Unbiased Learning to Rank Evaluation 2 (ULTRE-2) Task. In The 17th Round of NII Testbeds and Community for Information Access Research (NTCIR).
  37. Multi-Stage Document Ranking with BERT. arXiv:1910.14424 [cs.IR]
  38. Harrie Oosterhuis. 2022. Reaching the End of Unbiasedness: Uncovering Implicit Limitations of Click-Based Learning to Rank. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval (SIGIR).
  39. Harrie Oosterhuis and Maarten de Rijke. 2021. Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator That Effectively Utilizes Online Interventions. In International Conference on Web Search and Data Mining (WSDM).
  40. TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).
  41. The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models. arXiv:2101.05667 [cs.IR]
  42. Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. (2013). https://doi.org/10.48550/arXiv.1306.2597 arXiv:1306.2597
  43. Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?. In International Conference on Learning Representations (ICLR).
  44. Predicting Clicks: Estimating the Click-through Rate for New Ads. In Proceedings of the 16th International Conference on World Wide Web (WWW).
  45. Stephen Robertson. 2004. Understanding Inverse Document Frequency: On Theoretical Arguments for IDF. Journal of documentation 60, 5 (2004), 503–520.
  46. Okapi at TREC-3. In Proceedings of The Third Text REtrieval Conference, TREC (NIST Special Publication, Vol. 500-225). 109–126.
  47. Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback. In International Conference on Web Search and Data Mining (WSDM).
  48. Mark Sanderson et al. 2010. Test Collection Based Evaluation of Information Retrieval. Foundations and Trends in Information Retrieval 4 (2010), 247–375.
  49. On the Impact of Outlier Bias on User Clicks. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  50. Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search. In Proceedings of The Sixteen ACM International Conference on Web Search and Data Mining (WSDM).
  51. ULTRA: An Unbiased Learning To Rank Algorithm Toolbox. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM).
  52. Cascade Model-Based Propensity Estimation for Counterfactual Learning to Rank. In International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  53. When Inverse Propensity Scoring Does Not Work: Affine Corrections for Unbiased Learning to Rank. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM).
  54. Learning to Rank with Selection Bias in Personal Search. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  55. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of The Eleventh ACM International Conference on Web Search and Data Mining (WSDM). 610–618.
  56. Revisiting Two-tower Models for Unbiased Learning to Rank. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
  57. CIR at the NTCIR-17 ULTRE-2 Task. In The 17th Round of NII Testbeds and Community for Information Access Research (NTCIR).
  58. Feature-Enhanced Network with Hybrid Debiasing Strategies for Unbiased Learning to Rank. In Proceedings of The Sixteen ACM International Conference on Web Search and Data Mining (WSDM).
  59. Towards Disentangling Relevance and Bias in Unbiased Learning to Rank. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).
  60. Overview of the NTCIR-16 Unbiased Learning to Rank Evaluation (ULTRE) Task. In The 16th Round of NII Testbeds and Community for Information Access Research (NTCIR).
  61. Recommending What Video to Watch Next: A Multitask Ranking System. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys).
  62. Cross-Positional Attention for Debiasing Clicks. In Proceedings of the Web Conference 2021 (WebConf).
  63. A Large Scale Search Dataset for Unbiased Learning to Rank. In Advances in Neural Information Processing Systems (NeurIPS).
  64. Pre-training for Web Search. https://aistudio.baidu.com/competition/detail/536/0/introduction
  65. Unbiased Learning for Web Search. https://aistudio.baidu.com/competition/detail/534/0/introduction
Citations (2)

Summary

We haven't generated a summary for this paper yet.