Long-context Language Models Are Not Good At ALL Retrieval Tasks Without Sufficient Steps (2410.04422v8)
Abstract: Long-context LLMs (LCLMs), characterized by their extensive context window, are becoming popular. However, despite they are nearly perfect at standard long-context retrieval tasks, our evaluations demonstrate they are not good at 2 basic cases, "multi-matching retrieval," and "logic-based retrieval", which are beyond LCLMs' ability boundary. But we find they can be well addressed with a sufficient number of reasoning steps, guided by specific CoT prompts, indicating the necessity of combining long-context tasks with CoT methods for more advanced long context handling. However, current CoT methods are too time-consuming, when the context is very long, which means efficient long-context handling still has a long way to go.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.