Balance reasoning performance with inference efficiency in RAG
Determine effective approaches to balance reasoning performance with inference efficiency when integrating reasoning large language models into Retrieval-Augmented Generation systems for multi-hop question answering, especially under latency-sensitive deployment constraints.
Sponsor
References
Balancing reasoning performance with inference efficiency remains an open challenge, particularly for real-world deployment in latency-sensitive scenarios.
— LIR$^3$AG: A Lightweight Rerank Reasoning Strategy Framework for Retrieval-Augmented Generation
(2512.18329 - Chen et al., 20 Dec 2025) in Abstract