Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization (2505.19307v1)

Published 25 May 2025 in cs.IR

Abstract: Neural retrieval models excel in Web search, but their training requires substantial amounts of labeled query-document pairs, which are costly to obtain. With the widespread availability of Web document collections like ClueWeb22, synthetic queries generated by LLMs offer a scalable alternative. Still, synthetic training queries often vary in quality, which leads to suboptimal downstream retrieval performance. Existing methods typically filter out noisy query-document pairs based on signals from an external re-ranker. In contrast, we propose a framework that leverages Direct Preference Optimization (DPO) to integrate ranking signals into the query generation process, aiming to directly optimize the model towards generating high-quality queries that maximize downstream retrieval effectiveness. Experiments show higher ranker-assessed relevance between query-document pairs after DPO, leading to stronger downstream performance on the MS~MARCO benchmark when compared to baseline models trained with synthetic data.

Collections

Summary

Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization

The paper "Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization" addresses the challenges inherent in training neural retrieval models for web search. Specifically, it examines the potential of leveraging Direct Preference Optimization (DPO) to enhance the quality and efficacy of synthetic queries generated by LLMs.

Neural retrieval models have gained prominence due to their superior performance in web search applications. However, their training typically depends on a large corpus of labeled query-document pairs—a requirement often met through the use of synthetic queries generated by LLMs. Despite the scalability provided by synthetic data, the variability in query quality frequently results in suboptimal retrieval performance downstream. Traditional approaches to ameliorate this issue involve filtering noise from these synthetic pairs using external re-rankers. In contrast, this paper proposes a framework that integrates ranking signals directly into the query generation process, thereby optimizing the generation of high-quality queries aimed at maximizing retrieval effectiveness.

Methodology

The framework utilizes a novel application of DPO to align query generation with ranking objectives inherently, thus bypassing post hoc filtering methods. The proposed methodology involves generating multiple queries per document using an initial query generator. These query-document pairs are then evaluated using a pre-trained ranking model, establishing a preference dataset from which the DPO method derives adjustments to fine-tune the generator towards producing higher relevance queries. This approach leverages either point-wise re-rankers or list-wise prompting techniques from LLMs, allowing flexibility in selecting reward signals based on specific application needs.

Experimental Evaluation

Extensive experiments were conducted on the MS MARCO dataset, a well-established benchmark for evaluating passage and document retrieval systems. The results demonstrated that the application of DPO leads to a marked improvement in the ranker-assessed relevance of query-document pairs. Compared to baseline models trained with traditional synthetic data, models trained within this framework exhibited superior performance in downstream retrieval tasks. Notably, the retention rate of queries post-DPO alignment (92%) starkly contrasted with the pre-DPO rate (62%), underscoring the increased quality and efficacy of the generated training data.

Implications and Future Directions

The implications of this research are twofold. Practically, it provides a more robust method for generating synthetic training data, improving the efficiency and effectiveness of neural retrieval models. Theoretically, it offers insights into integrating auxiliary task signals like ranking preferences directly into data generation models, potentially paving the way for applications beyond web search.

Future studies could further explore reward signal diversity beyond current ranking models and LLM prompts, integrating user-specific or application-specific feedback mechanisms. Moreover, as computational efficiency remains a concern, optimizing the query generation process in terms of resource utilization without sacrificing quality could also be a worthwhile pursuit. By enhancing the scalability and adaptability of this approach across different domains and datasets, researchers can continue to refine the methodologies underpinning synthetic data generation in AI, contributing to more effective and generalized artificial intelligence applications.

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now