Fixed output token limit for retrieval-augmented reasoning with variable retrieval depth
Determine the fixed output token limit to use when adapting length-controlled generation methods such as L1 and S1 to retrieval-augmented reasoning settings in which the retrieval depth varies across queries, so that the choice of a fixed limit remains appropriate despite variable amounts of retrieved content.
References
Also, none of these methods are studied in the context of retrieval augmentation. Although, they could be easily adopted, it is not clear how many tokens should be fixed for the limit as the retrieval depth could be different.
— Cost-Aware Retrieval-Augmentation Reasoning Models with Adaptive Retrieval Depth
(2510.15719 - Hashemi et al., 17 Oct 2025) in Section 2.2 (Text Generation with Length Penalization)