Papers
Topics
Authors
Recent
Search
2000 character limit reached

RecaLLM: Robust Retrieval for Long Contexts

Updated 13 April 2026
  • RecaLLM is a class of post-trained large language models that alternates chain-of-thought reasoning with evidence copying to counteract the lost-in-thought phenomenon.
  • It employs constrained decoding to enforce a strict alternation between reasoning and retrieval, ensuring robust performance even with prolonged context sequences.
  • Empirical results show that RecaLLM achieves state-of-the-art long-context retrieval accuracy, scaling efficiently to context lengths up to 128K tokens despite limited training windows.

RecaLLM denotes a class of post-trained LLMs designed to explicitly interleave chain-of-thought (CoT) reasoning with robust, verifiable in-context retrieval. The approach targets the “lost-in-thought” phenomenon, wherein the act of multi-step reasoning degrades the model’s ability to access and verbatim retrieve evidence from long contexts. By enforcing alternation between reasoning and copying of evidence via constrained decoding, RecaLLM achieves state-of-the-art long-context performance with negligible computational overhead and minimal dependence on extremely long training samples, scaling to context lengths up to 128K tokens despite training on windows of at most 10K tokens (Whitecross et al., 10 Apr 2026).

1. Motivation: The Lost-in-Thought Phenomenon

RecaLLM was introduced to address a key bottleneck observed in long-context LLMs: as reasoning traces lengthen, faithful in-context retrieval performance deteriorates substantially. This “lost-in-thought” effect is quantified as a stark drop in retrieval accuracy after any sequence of reasoning tokens, even when the retrieval task that follows would be trivial in isolation:

  • Let AdirectA_{\mathrm{direct}} be the accuracy of a direct key-value retrieval, and AreasonA_{\mathrm{reason}} the accuracy when retrieval is requested after a reasoning sequence of length LrL_r.
  • Empirically, for several open-source 7–8B parameter models (Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct, ProLong-8B-512K, etc.), AreasonA_{\mathrm{reason}} fell from ~80% to ~40% after only a short CoT trace at 4K context, and from ~25% to ~5% at 128K [(Whitecross et al., 10 Apr 2026), Table 1].

Injection studies confirmed that even forcibly re-exposing the model

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RecaLLM.