Language models can learn implicit multi-hop reasoning, but only if they have lots of training data
Abstract: Implicit reasoning is the ability of a LLM to solve multi-hop reasoning tasks in a single forward pass, without chain of thought. We investigate this capability using GPT2-style LLMs trained from scratch on controlled $k$-hop reasoning datasets ($k = 2, 3, 4$). We show that while such models can indeed learn implicit $k$-hop reasoning, the required training data grows exponentially in $k$, and the required number of transformer layers grows linearly in $k$. We offer a theoretical explanation for why this depth growth is necessary. We further find that the data requirement can be mitigated, but not eliminated, through curriculum learning.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.