Extent of LLM performance attributable to reasoning vs memorized knowledge
Determine the extent to which large language models’ task performance reflects genuine reasoning as opposed to recall of memorized parametric world knowledge, by explicitly separating and measuring these contributions in controlled evaluations.
References
Yet, as LMs continue to be trained on massive web corpora (often with undisclosed training data), it remains unclear to what extent their performance reflects genuine reasoning versus the reciting of memorized knowledge.
— SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
(2510.24427 - Gu et al., 28 Oct 2025) in Introduction (Section 1)