Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search (2302.09340v1)

Published 18 Feb 2023 in cs.IR

Abstract: An effective ranking model usually requires a large amount of training data to learn the relevance between documents and queries. User clicks are often used as training data since they can indicate relevance and are cheap to collect, but they contain substantial bias and noise. There has been some work on mitigating various types of bias in simulated user clicks to train effective learning-to-rank models based on multiple features. However, how to effectively use such methods on large-scale pre-trained models with real-world click data is unknown. To alleviate the data bias in the real world, we incorporate heuristic-based features, refine the ranking objective, add random negatives, and calibrate the propensity calculation in the pre-training stage. Then we fine-tune several pre-trained models and train an ensemble model to aggregate all the predictions from various pre-trained models with human-annotation data in the fine-tuning stage. Our approaches won 3rd place in the "Pre-training for Web Search" task in WSDM Cup 2023 and are 22.6% better than the 4th-ranked team.

Citations (1)

Summary

We haven't generated a summary for this paper yet.