Inside-Out: Hidden Factual Knowledge in LLMs

Published 19 Mar 2025 in cs.CL | (2503.15299v4)

Abstract: This work presents a framework for assessing whether LLMs encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at this possibility, none has clearly defined or demonstrated this phenomenon. We first propose a formal definition of knowledge, quantifying it for a given question as the fraction of correct-incorrect answer pairs where the correct one is ranked higher. This gives rise to external and internal knowledge, depending on the information used to score individual answer candidates: either the model's observable token-level probabilities or its intermediate computations. Hidden knowledge arises when internal knowledge exceeds external knowledge. We then present a case study, applying this framework to three popular open-weights LLMs in a closed-book QA setup. Our results indicate that: (1) LLMs consistently encode more factual knowledge internally than what they express externally, with an average relative gap of 40%. (2) Surprisingly, some knowledge is so deeply hidden that a model can internally know an answer perfectly, yet fail to generate it even once, despite large-scale repeated sampling of 1,000 answers. This reveals fundamental limitations in the generation capabilities of LLMs, which (3) put a practical constraint on scaling test-time compute via repeated answer sampling in closed-book QA: significant performance improvements remain inaccessible because some answers are practically never sampled, yet if they were, we would be guaranteed to rank them first.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper reveals that LLMs internally store up to 40% more factual knowledge than is evident in their generated outputs.
It employs a dual scoring framework that compares internal classifier predictions with external token-level probabilities across 1,700 curated questions.
The study suggests that improved decoding techniques could better harness hidden knowledge, enhancing reliability in knowledge-intensive tasks.

Hidden Factual Knowledge in LLMs

The paper "Inside-Out: Hidden Factual Knowledge in LLMs" explores the phenomenon whereby LLMs encode more factual information in their internal parameters than is observable through their direct outputs. This concept, termed as 'hidden knowledge', has significant implications for understanding the limits and potentials of LLMs in knowledge representation and retrieval tasks.

Framework for Hidden Knowledge

The researchers introduce a robust framework to methodically investigate hidden knowledge within LLMs. They propose a formal definition of knowledge for LLMs in a question-answering setting. Knowledge is quantified per question based on the model's ability to correctly rank the likelihood of (correct, incorrect) answer pairs. The framework distinguishes between internal knowledge, derived from intermediate computations of the model, and external knowledge, derived from the model's token-level probabilities.

Hidden knowledge is present when internal knowledge metrics surpass external ones, indicating the model internally "knows" answers it cannot generate reliably.

Study Design and Execution

A comprehensive case study was conducted using three popular open-weight LLMs under a closed-book question answering (QA) setup. The experiment utilized approximately 1,700 carefully curated questions to evaluate the models' hidden knowledge. Several scoring methods were employed to assess knowledge:

External Scoring Methods:
- Token-level probabilities of generating an answer (P(a|q)).
- Length-normalized probability scores.
Internal Scoring Methods:
- A linear classifier trained on the model's hidden representations to predict answer correctness.

The distinct separation of internal and external scoring allows for a direct comparison of the knowledge expressed versus encoded.

Findings

Figure 1: The paper quantifies the number of unique answers and unique correct answers per question, illustrating the diversity and challenges in the QA setups.

Key findings include:

LLMs universally encode more factual data internally than expressed externally, with an average relative gap of about 40%.
A significant portion of encoded knowledge is deeply hidden, leading to scenarios where a correct answer is encoded but not generated even once across 1,000 attempts. This highlights fundamental generation limitations in LLMs.

Implications and Practical Applications

These insights illuminate several critical avenues for application and research:

Decoding Improvements: There's potential to mine more correct answers through improved decoding strategies leveraging internal model signals, as highlighted by the notable performance gap in internal vs. external knowledge measurement.
Model Reliability: Understanding hidden knowledge in LLMs can enhance model reliability in knowledge-intensive applications by developing methods to unearth and utilize suppressed knowledge acts.
Implications for Scale: Counters the assumption that model scaling and increased compute during testing alone will surface all factual knowledge—pointing instead to sophisticated techniques that account for model internals.

Conclusion

The framework and findings presented in this study challenge prevalent notions about the capabilities of LLMs, urging a reconsideration of how these models' 'knowledge' is defined and evaluated. It sets a foundational basis for exploring innovative methods to access the wealth of factual knowledge deeply embedded within LLMs, ultimately aiming to develop models that are not only larger but inherently more transparent and informative. Future work might focus on refining decoding mechanisms to align more closely with the rich internal landscapes of these models to fully leverage the encoded knowledge.

Markdown Report Issue