Estimating Knowledge in Large Language Models Without Generating a Single Token

Published 18 Jun 2024 in cs.CL | (2406.12673v2)

Abstract: To evaluate knowledge in LLMs, current methods query the model and then evaluate its generated responses. In this work, we ask whether evaluation can be done before the model has generated any text. Concretely, is it possible to estimate how knowledgeable a model is about a certain entity, only from its internal computation? We study this question with two tasks: given a subject entity, the goal is to predict (a) the ability of the model to answer common questions about the entity, and (b) the factuality of open-ended responses generated by the model about the entity. Experiments with a variety of LLMs show that KEEN, a simple probe trained over internal subject representations, succeeds at both tasks - correlating with both the QA accuracy of the model per-subject and FActScore, a recent factuality metric in open-ended generation. Moreover, KEEN naturally aligns with the model's hedging behavior and faithfully reflects changes in the model's knowledge after fine-tuning. Lastly, we show a more interpretable yet equally performant variant of KEEN, which highlights a small set of tokens indicative of clusters and gaps in the model's knowledge. Being simple and lightweight, KEEN can be leveraged to guide decisions such as when it is appropriate to apply further training or augment queries with retrieval.

Abstract PDF HTML Upgrade to Chat

Citations (7)

View on Semantic Scholar

Summary

The paper presents the KEEN method that probes hidden representations to estimate entity knowledge without text generation.
It employs logistic regression on feature vectors from transformer layers to predict QA accuracy and factuality scores.
KEEN reliably correlates with model hedging behavior and identifies knowledge gaps, offering valuable insights for model augmentation.

Estimating Knowledge in LLMs Without Generating a Single Token

This essay provides a detailed exploration into the techniques developed for estimating entity knowledge within LLMs without the need for text generation. This approach introduces the KEEN (Knowledge Estimation of ENtities) method, which aims to assess a model's knowledge based on its internal computations.

Introduction to KEEN

KEEN proposes a novel way to measure the knowledge LLMs possess about specific entities using probes on hidden model representations rather than relying on generated outputs. The paper introduces two primary tasks in this domain: predicting the entity-specific QA accuracy and the factuality of text generated about the entities. These objectives aim to assess entity knowledge directly from internal model states, bypassing the need for text generation and subsequent evaluation.

Figure 1: Simple probes (KEEN) quantify model knowledge about a subject entity by estimating question-answering accuracy and factuality of text generation.

Methodology

Feature Extraction

The KEEN approach leverages the hidden representations within the layers of Transformer-based LLMs. Specifically, it draws from upper-intermediate layers which are known to effectively encapsulate entity attributes into straightforward linear projections. To create the feature vector for a given entity, KEEN employs various feature sets such as hidden states (HS), vocabulary projections (VP), and a more interpretable form called VP-k, which selects the top-k influential tokens to enhance interpretability without sacrificing accuracy.

Probing Mechanism

Probing uses logistic regression to predict either the QA accuracy or factuality score based on extracted feature vectors. A sigmoid function ensures these predictions are bound within [0, 1], allowing models to generalize across diverse entities efficiently.

Experimental Evaluation

Experimental Settings

Two principal evaluation settings are established—Factual QA and Open-Ended Generation (OEG). For the QA-task, entity-specific questions derived from PopQA data are employed, and for OEG, KEEN correlates its outputs with FActScore, an open-ended text evaluation metric that determines the factual correctness of generative outputs.

The experiments cover multiple LLMs including GPT2, Pythia, LLaMA2, and Vicuna, showcasing KEEN's versatility across models of varying architecture and complexity.

Results and Analysis

The KEEN probes demonstrate strong correlations with the model's QA accuracy and the factual scores, outperforming other baselines that utilize either intrinsic features like self-attention outputs or external statistics such as entity popularity. Further, KEEN has been shown to faithfully reflect knowledge changes following fine-tuning, showcasing its potential for guiding model augmentation efforts or identifying knowledge gaps.

Figure 2: Changes in KEEN QA and average QA accuracy scores post fine-tuning LLaMA2 7B, highlighting KEEN's robust reflection of knowledge adjustments.

Probing Internal Model Behavior

KEEN’s scores also align with model hedging behavior, where predictions correlate inversely with the fraction of queries hedged by the model, emphasizing the reliability of internal representations for forecasting entity knowledge.

Interpretability and Generalization

Through VP-k probes, KEEN introduces interpretability in understanding how entities are embedded and processed within the model. Analysis of the critical token weights drawn from VP-k offers insights into which entity-specific features are most influential, aiding in distinguishing highly knowledgeable representations from less knowledgeable ones.

Figure 3: Difference in the median rank of tokens with negative and positive weights for accuracy differentiation.

Conclusion

KEEN introduces a simple yet powerful framework for estimating the knowledge and performance of LLMs on a per-entity basis without output generation. This methodology not only presents a pathway to assess model knowledge but also unlocks potential enhancements in model deployment strategies and educational calibration of LLMs. Future explorations may extend KEEN’s applicability beyond named entities to more abstract subjects, along with integrating KEEN for broader model architectures beyond transformers.

Overall, KEEN improvises the landscape of model interpretability and factual reliability in LLMs, paving the way for more robust applications across varied AI domains.

Markdown