Mitigate the computational cost of reasoning models used in RAG

Establish methods to reduce the computational cost—including token consumption and inference latency—of reasoning large language models used as generators in Retrieval-Augmented Generation pipelines for multi-hop question answering.

Background

The authors observe that reasoning models substantially enhance multi-hop QA performance in RAG but incur heavy computational overhead. They explicitly note the persistence of this cost problem, motivating approaches that restructure retrieved evidence into concise reasoning chains to minimize unnecessary generation.

Addressing this problem requires techniques that preserve the benefits of structured reasoning while eliminating redundant or irrelevant steps and improving the efficiency of integrating retrieved context into generation. The proposed LiR3AG is presented as a step toward this goal.

References

However, the cost problem of reasoning models remains unsolved.

LIR$^3$AG: A Lightweight Rerank Reasoning Strategy Framework for Retrieval-Augmented Generation (2512.18329 - Chen et al., 20 Dec 2025) in Section 3.1 (Reasoning Models in RAG)