Interpreting synthetic-task performance: reliance on prior knowledge vs reasoning

Ascertain how much large language models rely on prior parametric factual knowledge when solving synthetic unseen tasks and distinguish whether failures arise from insufficient reasoning complexity or from missing background knowledge that models typically exploit.

Background

The authors note that evaluations restricted to synthetic, unseen tasks avoid contamination by prior training data but leave important questions unresolved. Success may demonstrate reasoning in isolation, but does not indicate the typical reliance on prior knowledge; conversely, failures are ambiguous between reasoning difficulty and lack of background knowledge.

SynthWorlds is proposed to control task difficulty and parametric knowledge relevance by constructing parallel corpora and mirrored tasks, allowing the contribution of prior knowledge versus reasoning to be quantified—addressing the ambiguity highlighted in this open question.

References

Crucially, evaluations based only on synthetic unseen tasks still leave open questions about performance. Success demonstrates reasoning in isolation, but it does not reveal how much models typically rely on prior knowledge as a scaffold.

— SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models (2510.24427 - Gu et al., 28 Oct 2025) in Introduction (Section 1)

Interpreting synthetic-task performance: reliance on prior knowledge vs reasoning

Sponsor

Background

References

Related Problems