Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 100 tok/s

Gemini 2.5 Pro 58 tok/s Pro

GPT-5 Medium 29 tok/s

GPT-5 High 29 tok/s Pro

GPT-4o 103 tok/s

GPT OSS 120B 480 tok/s Pro

Kimi K2 215 tok/s Pro

2000 character limit reached

TongSearch-QR: Reinforced Query Reasoning for Retrieval (2506.11603v2)

Published 13 Jun 2025 in cs.IR

Abstract: Traditional information retrieval (IR) methods excel at textual and semantic matching but struggle in reasoning-intensive retrieval tasks that require multi-hop inference or complex semantic understanding between queries and documents. One promising solution is to explicitly rewrite or augment queries using LLMs to elicit reasoning-relevant content prior to retrieval. However, the widespread use of large-scale LLMs like GPT-4 or LLaMA3-70B remains impractical due to their high inference cost and limited deployability in real-world systems. In this work, we introduce TongSearch QR (Previously Known as "TongSearch Reasoner"), a family of small-scale LLMs for query reasoning and rewriting in reasoning-intensive retrieval. With a novel semi-rule-based reward function, we employ reinforcement learning approaches enabling smaller LLMs, e,g, Qwen2.5-7B-Instruct and Qwen2.5-1.5B-Instruct, to achieve query reasoning performance rivaling large-scale LLMs without their prohibitive inference costs. Experiment results on BRIGHT benchmark show that with BM25 as retrievers, both TongSearch QR-7B and TongSearch QR-1.5B models significantly outperform existing baselines, including prompt-based query reasoners and some latest dense retrievers trained for reasoning-intensive retrieval tasks, offering superior adaptability for real-world deployment.

Collections

Summary

The paper introduces TongSearch-QR, a model that enhances query reasoning in retrieval tasks through reinforcement learning.
It employs a novel Group Relative Policy Optimization strategy with semi-rule-based rewards to efficiently rewrite queries.
Experimental results demonstrate cost-effective performance with an NDCG@10 of 27.9, outperforming some larger models including GPT-4.

TongSearch-QR: Reinforced Query Reasoning for Retrieval

Introduction

The paper "TongSearch-QR: Reinforced Query Reasoning for Retrieval" introduces a novel approach to enhance retrieval systems, particularly in scenarios requiring reasoning-intensive tasks. Traditional information retrieval (IR) systems, well-versed in textual and semantic matching, falter in complex multi-hop inference tasks. This work proposes the TongSearch-QR model family, leveraging reinforced learning to use smaller LLMs effectively, achieving reasoning performance comparable to larger, more resource-intensive models like GPT-4 and LLaMA3-70B.

Methodology

Query Reasoning Enhancement

The key innovation of TongSearch-QR lies in its ability to perform query reasoning efficiently using smaller LLMs. Traditional models face challenges like prohibitive inference costs and security concerns, making deployment in real-world systems difficult. TongSearch-QR utilizes models such as Qwen2.5-7B-Instruct and Qwen2.5-1.5B-Instruct to rewrite queries, achieving high reasoning performance without the significant computational demands typical of larger models.

Reinforcement Learning with Semi-Rule-Based Rewards

A critical component of the TongSearch-QR approach is its semi-rule-based reward function. Drawing from reinforcement learning paradigms, it employs a Group Relative Policy Optimization (GRPO) strategy. This involves novel reward functions, which account for the reasoning gaps between queries and documents, offering robustness and computational efficiency. By focusing on the improvement in relevance scores from original to reasoned queries, the method ensures reward robustness, avoiding potential hacking scenarios typical in model-based reward functions.

Experimental Results and Analysis

Performance Evaluation

TongSearch-QR models were tested on the BRIGHT benchmark, a rigorous platform for assessing reasoning-intensive retrieval efficacy. Experimental results revealed that the TongSearch-QR-7B variant outperformed conventional baselines as well as some large-scale models, achieving an NDCG@10 score of 27.9, surpassing GPT-4o's 26.5. Notably, the 1.5B model also showed competitive performance, making it a viable option for resource-limited applications.

Cost Efficiency

One of the standout aspects of TongSearch-QR is its cost-effectiveness. The cost-performance analysis demonstrated that TongSearch-QR models provide significant savings in inference costs compared to larger models, offering a high efficiency ratio. The performance versus cost metrics clearly favor TongSearch-QR in practical deployment scenarios, reflecting an optimal balance of performance and financial viability.

Figure 1: Cost vs. Performance comparison of different models.

Implications and Future Direction

The implications of this research are manifold. Practically, TongSearch-QR offers a viable solution for enterprises needing efficient yet powerful retrieval systems that can handle reasoning-intensive tasks without incurring high inference costs. Theoretically, it opens new avenues for leveraging small-scale models in sophisticated IR tasks traditionally dominated by large-scale setups.

Future work could explore deeper integrations with reasoning-intensive retrievers to further enhance performance. The adaptability of the TongSearch-QR models suggests potential for widespread application across varying retrieval contexts, potentially incorporating broader knowledge bases and more diverse datasets.

Conclusion

TongSearch-QR represents a significant step forward in the domain of information retrieval, bridging the gap between computational efficiency and reasoning capability. By employing novel reinforcement learning strategies and reward functions, it delivers robust performance and adaptability, highlighting its potential in both theoretical explorations and practical applications. As the landscape of AI retrieval systems evolves, TongSearch-QR sets a precedent for future innovations in the field.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (5)

Tweets

https://twitter.com/_reachsumit/status/1934488160270266815