Interpreting synthetic-task performance: reliance on prior knowledge vs reasoning
Ascertain how much large language models rely on prior parametric factual knowledge when solving synthetic unseen tasks and distinguish whether failures arise from insufficient reasoning complexity or from missing background knowledge that models typically exploit.
References
Crucially, evaluations based only on synthetic unseen tasks still leave open questions about performance. Success demonstrates reasoning in isolation, but it does not reveal how much models typically rely on prior knowledge as a scaffold.
— SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
(2510.24427 - Gu et al., 28 Oct 2025) in Introduction (Section 1)