Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study (2404.03707v1)

Published 4 Apr 2024 in cs.LG, cs.AI, and cs.IR

Abstract: Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. While the CLTR models can be theoretically unbiased when the user behavior assumption is correct and the propensity estimation is accurate, their effectiveness is usually empirically evaluated via simulation-based experiments due to a lack of widely-available, large-scale, real click logs. However, the mainstream simulation-based experiments are somewhat limited as they often feature a single, deterministic production ranker and simplified user simulation models to generate the synthetic click logs. As a result, the robustness of CLTR models in complex and diverse situations is largely unknown and needs further investigation. To address this problem, in this paper, we aim to investigate the robustness of existing CLTR models in a reproducibility study with extensive simulation-based experiments that (1) use both deterministic and stochastic production rankers, each with different ranking performance, and (2) leverage multiple user simulation models with different user behavior assumptions. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation. Besides, the existing CLTR models often fail to outperform the naive click baselines when the production ranker has relatively high ranking performance or certain randomness, which suggests an urgent need for developing new CLTR algorithms that work for these settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. A general framework for counterfactual learning-to-rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 5–14.
  2. Addressing trust bias for unbiased learning-to-rank. In The World Wide Web Conference. 4–14.
  3. Unbiased learning to rank with unbiased propensity estimation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 385–394.
  4. A neural click model for web search. In Proceedings of the 25th International Conference on World Wide Web. 531–541.
  5. A click sequence model for web search. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 45–54.
  6. Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the Yahoo! Learning to Rank Challenge. 1–24.
  7. Large-scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Systems (TOIS) 30, 1 (2012), 1–41.
  8. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. 621–630.
  9. Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th international conference on World wide web. 1–10.
  10. A context-aware click model for web search. In Proceedings of the 13th International Conference on Web Search and Data Mining. 88–96.
  11. Adapting interactional observation embedding for counterfactual learning to rank. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 285–294.
  12. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining. 87–94.
  13. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological) 39, 1 (1977), 1–22.
  14. Georges E Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations.. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 331–338.
  15. Intervention harvesting for context-dependent examination-bias estimation. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 825–834.
  16. A comparative study of click models for web search. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th International Conference of the CLEF Association, CLEF’15, Toulouse, France, September 8-11, 2015, Proceedings 6. Springer, 78–90.
  17. Efficient multiple-click models in web search. In Proceedings of the second acm international conference on web search and data mining. 124–131.
  18. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM international on conference on information and knowledge management. 55–64.
  19. Unbiased lambdamart: an unbiased pairwise learning-to-rank algorithm. In The World Wide Web Conference. 2830–2836.
  20. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.
  21. Accurately Interpreting Clickthrough Data as Implicit Feedback. (2005).
  22. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS) 25, 2 (2007), 7–es.
  23. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 781–789.
  24. Joseph DY Kang and Joseph L Schafer. 2007. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. (2007).
  25. R Duncan Luce. 2012. Individual choice behavior: A theoretical analysis. Courier Corporation.
  26. Model-based unbiased learning to rank. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 895–903.
  27. Constructing click models for mobile search. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 775–784.
  28. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th international conference on world wide web. 1291–1299.
  29. Harrie Oosterhuis. 2023. Doubly Robust Estimation for Correcting Position Bias in Click Feedback for Unbiased Learning to Rank. ACM Transactions on Information Systems 41, 3 (2023), 1–33.
  30. Harrie Oosterhuis and Maarten de Rijke. 2018. Differentiable unbiased online learning to rank. In Proceedings of the 27th ACM international conference on information and knowledge management. 1293–1302.
  31. Harrie Oosterhuis and Maarten de Rijke. 2020. Policy-aware unbiased learning to rank for top-k rankings. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 489–498.
  32. Harrie Oosterhuis and Maarten de Rijke. 2021. Unifying online and counterfactual learning to rank: A novel counterfactual estimator that effectively utilizes online interventions. In Proceedings of the 14th ACM international conference on web search and data mining. 463–471.
  33. Harrie Oosterhuis and Maarten de Rijke. 2022. Reaching the End of Unbiasedness: Uncovering Implicit Limitations of Click-Based Learning to Rank. In Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval. ACM.
  34. Maeve O’Brien and Mark T Keane. 2006. Modeling result-list searching in the World Wide Web: The role of relevance topologies and trust bias. In Proceedings of the 28th annual conference of the cognitive science society, Vol. 28. Citeseer, 1881–1886.
  35. Robin L Plackett. 1975. The analysis of permutations. Journal of the Royal Statistical Society Series C: Applied Statistics 24, 2 (1975), 193–202.
  36. Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. CoRR abs/1306.2597 (2013). http://arxiv.org/abs/1306.2597
  37. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012).
  38. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web. 521–530.
  39. Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41–55.
  40. Multileave gradient descent for fast online learning to rank. In proceedings of the ninth ACM international conference on web search and data mining. 457–466.
  41. Multileaved comparisons for fast online evaluation. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 71–80.
  42. Adith Swaminathan and Thorsten Joachims. 2015. Batch learning from logged bandit feedback through counterfactual risk minimization. The Journal of Machine Learning Research 16, 1 (2015), 1731–1755.
  43. Cascade model-based propensity estimation for counterfactual learning to rank. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2089–2092.
  44. When inverse propensity scoring does not work: Affine corrections for unbiased learning to rank. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1475–1484.
  45. Variance reduction in gradient exploration for online learning to rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 835–844.
  46. Efficient exploration of gradient space for online learning to rank. In The 41st international ACM SIGIR conference on research & development in information retrieval. 145–154.
  47. Non-clicks mean irrelevant? propensity ratio scoring as a correction. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 481–489.
  48. Learning to rank with selection bias in personal search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 115–124.
  49. Position bias estimation for unbiased learning to rank in personal search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 610–618.
  50. Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning. 1201–1208.
  51. Beyond position bias: Examining result attractiveness as a source of presentation bias in clickthrough data. In Proceedings of the 19th international conference on World wide web. 1011–1018.
  52. Constructing a comparison-based click model for web search. In Proceedings of the Web Conference 2021. 270–283.
  53. Tong Zhao and Irwin King. 2016. Constructing reliable gradient exploration for online learning to rank. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1643–1652.
  54. Approximated doubly robust search relevance estimation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3756–3765.
  55. A Large Scale Search Dataset for Unbiased Learning to Rank. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zechun Niu (1 paper)
  2. Jiaxin Mao (47 papers)
  3. Qingyao Ai (113 papers)
  4. Ji-Rong Wen (299 papers)

Summary

We haven't generated a summary for this paper yet.