Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 97 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 26 tok/s Pro
GPT-4o 86 tok/s
GPT OSS 120B 452 tok/s Pro
Kimi K2 211 tok/s Pro
2000 character limit reached

Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization (2505.19307v1)

Published 25 May 2025 in cs.IR

Abstract: Neural retrieval models excel in Web search, but their training requires substantial amounts of labeled query-document pairs, which are costly to obtain. With the widespread availability of Web document collections like ClueWeb22, synthetic queries generated by LLMs offer a scalable alternative. Still, synthetic training queries often vary in quality, which leads to suboptimal downstream retrieval performance. Existing methods typically filter out noisy query-document pairs based on signals from an external re-ranker. In contrast, we propose a framework that leverages Direct Preference Optimization (DPO) to integrate ranking signals into the query generation process, aiming to directly optimize the model towards generating high-quality queries that maximize downstream retrieval effectiveness. Experiments show higher ranker-assessed relevance between query-document pairs after DPO, leading to stronger downstream performance on the MS~MARCO benchmark when compared to baseline models trained with synthetic data.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization

The paper "Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization" addresses the challenges inherent in training neural retrieval models for web search. Specifically, it examines the potential of leveraging Direct Preference Optimization (DPO) to enhance the quality and efficacy of synthetic queries generated by LLMs.

Neural retrieval models have gained prominence due to their superior performance in web search applications. However, their training typically depends on a large corpus of labeled query-document pairs—a requirement often met through the use of synthetic queries generated by LLMs. Despite the scalability provided by synthetic data, the variability in query quality frequently results in suboptimal retrieval performance downstream. Traditional approaches to ameliorate this issue involve filtering noise from these synthetic pairs using external re-rankers. In contrast, this paper proposes a framework that integrates ranking signals directly into the query generation process, thereby optimizing the generation of high-quality queries aimed at maximizing retrieval effectiveness.

Methodology

The framework utilizes a novel application of DPO to align query generation with ranking objectives inherently, thus bypassing post hoc filtering methods. The proposed methodology involves generating multiple queries per document using an initial query generator. These query-document pairs are then evaluated using a pre-trained ranking model, establishing a preference dataset from which the DPO method derives adjustments to fine-tune the generator towards producing higher relevance queries. This approach leverages either point-wise re-rankers or list-wise prompting techniques from LLMs, allowing flexibility in selecting reward signals based on specific application needs.

Experimental Evaluation

Extensive experiments were conducted on the MS MARCO dataset, a well-established benchmark for evaluating passage and document retrieval systems. The results demonstrated that the application of DPO leads to a marked improvement in the ranker-assessed relevance of query-document pairs. Compared to baseline models trained with traditional synthetic data, models trained within this framework exhibited superior performance in downstream retrieval tasks. Notably, the retention rate of queries post-DPO alignment (92%) starkly contrasted with the pre-DPO rate (62%), underscoring the increased quality and efficacy of the generated training data.

Implications and Future Directions

The implications of this research are twofold. Practically, it provides a more robust method for generating synthetic training data, improving the efficiency and effectiveness of neural retrieval models. Theoretically, it offers insights into integrating auxiliary task signals like ranking preferences directly into data generation models, potentially paving the way for applications beyond web search.

Future studies could further explore reward signal diversity beyond current ranking models and LLM prompts, integrating user-specific or application-specific feedback mechanisms. Moreover, as computational efficiency remains a concern, optimizing the query generation process in terms of resource utilization without sacrificing quality could also be a worthwhile pursuit. By enhancing the scalability and adaptability of this approach across different domains and datasets, researchers can continue to refine the methodologies underpinning synthetic data generation in AI, contributing to more effective and generalized artificial intelligence applications.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.