Factual Probing Is [MASK]: Learning vs. Learning to Recall (2104.05240v2)

Published 12 Apr 2021 in cs.CL

Abstract: Petroni et al. (2019) demonstrated that it is possible to retrieve world facts from a pre-trained LLM by expressing them as cloze-style prompts and interpret the model's prediction accuracy as a lower bound on the amount of factual information it encodes. Subsequent work has attempted to tighten the estimate by searching for better prompts, using a disjoint set of facts as training data. In this work, we make two complementary contributions to better understand these factual probing techniques. First, we propose OptiPrompt, a novel and efficient method which directly optimizes in continuous embedding space. We find this simple method is able to predict an additional 6.4% of facts in the LAMA benchmark. Second, we raise a more important question: Can we really interpret these probing results as a lower bound? Is it possible that these prompt-search methods learn from the training data too? We find, somewhat surprisingly, that the training data used by these methods contains certain regularities of the underlying fact distribution, and all the existing prompt methods, including ours, are able to exploit them for better fact prediction. We conduct a set of control experiments to disentangle "learning" from "learning to recall", providing a more detailed picture of what different prompts can reveal about pre-trained LLMs.

PDF Abstract

Overview of "Factual Probing Is [MASK]: Learning vs. Learning to Recall"

The paper "Factual Probing Is [MASK]: Learning vs. Learning to Recall" by Zexuan Zhong, Dan Friedman, and Danqi Chen explores the factual probing capabilities of pre-trained LLMs, such as BERT, with an emphasis on distinguishing between learning and learning to recall. The research builds on the foundational work of Petroni et al. (2019), which established the potential for retrieving world facts using cloze-style prompts. The introduction of novel probing techniques such as OptiPrompt has demonstrated enhanced prediction capabilities, albeit sparking questions regarding the extent to which these improvements reflect the inherent factual encoding of the models versus learning from training data.

Contributions and Methodologies

This paper makes two primary contributions to the factual probing discourse. The first is the introduction of OptiPrompt, an innovative, computationally efficient method that optimizes directly within the continuous embedding space, thereby outpacing the traditional discrete prompt search methods. Second, the research posits a crucial question regarding the factual probing paradigm: Are the probing results an accurate lower bound of factual knowledge embedded within models, or are they manifestations of learned correlations present within the training data?

To unravel these contributions, the paper employs the LAMA benchmark as the focal evaluation task. OptiPrompt achieves a marked enhancement in the accuracy of fact prediction to 48.6%, signifying a 6.4% increase over previous methods. Moreover, the analysis reveals that echoing the training data’s statistical regularities can substantially inflate accuracy scores, questioning the interpretations of enhanced performance as a genuine reflection of a model's innate factual encoding.

Evaluations and Findings

The exploration implements control experiments to disentangle the nuances between learning and recalling. Through the deployment of a "Random Model" (RM) control, it was shown that prompting could exploit information solely from training distributions. This is further illustrated through a "Random Embeddings" (RE) baseline, where initial input embeddings were reinitialized, thereby probing the model's capacity to extract information purely from contextual embeddings.

An intriguing insight from the paper is the distinction between "easy" and "hard" examples in the LAMA benchmark, depending on whether they can be predicted using training data alone. This bifurcation aids in highlighting that while advanced prompting techniques like OptiPrompt can leverage known data distributions effectively, a significant batch of difficult examples requires genuine model recall - thereby acting as a more robust measure of how much factual content is intrinsically encoded by pre-trained LLMs.

Implications for Future Research

The implications of this research extend into both practical and theoretical domains. Practically, the reception and critical evaluation of model capabilities need to be nuanced, acknowledging the bleed-through of learned data regularities into perceived factual recall. Theoretically, the paper underscores the importance of developing methods able to disentangle intrinsic knowledge representation from exploitative learning behaviors. Future research might focus on more granular attribution methods to better understand how LLMs catalog and retrieve factual information or explore enhancing interpretability and transparency of continuous prompting systems.

By offering a comprehensive analysis of probing performance through the lens of factual encoding versus learning to recall, this paper broadens the discourse on how effectively pre-trained models understand and internalize factual knowledge, thereby informing both the development of improved models and probing methodologies in natural language processing.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Zexuan Zhong (17 papers)
Dan Friedman (16 papers)
Danqi Chen (84 papers)

Citations (385)

View on Semantic Scholar

Factual Probing Is [MASK]: Learning vs. Learning to Recall (2104.05240v2)

Overview of "Factual Probing Is [MASK]: Learning vs. Learning to Recall"

Contributions and Methodologies

Evaluations and Findings

Implications for Future Research

Related Papers