Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 147 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 27 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 96 tok/s Pro

Kimi K2 188 tok/s Pro

GPT OSS 120B 398 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Cost-Aware Retrieval-Augmentation Reasoning Models with Adaptive Retrieval Depth (2510.15719v1)

Published 17 Oct 2025 in cs.CL and cs.IR

Abstract: Reasoning models have gained significant attention due to their strong performance, particularly when enhanced with retrieval augmentation. However, these models often incur high computational costs, as both retrieval and reasoning tokens contribute substantially to the overall resource usage. In this work, we make the following contributions: (1) we propose a retrieval-augmented reasoning model that dynamically adjusts the length of the retrieved document list based on the query and retrieval results; (2) we develop a cost-aware advantage function for training of efficient retrieval-augmented reasoning models through reinforcement learning; and (3) we explore both memory- and latency-bound implementations of the proposed cost-aware framework for both proximal and group relative policy optimization algorithms. We evaluate our approach on seven public question answering datasets and demonstrate significant efficiency gains, without compromising effectiveness. In fact, we observed that the model latency decreases by ~16-20% across datasets, while its effectiveness increases by ~5% on average, in terms of exact match.