Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 133 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 61 tok/s Pro
Kimi K2 194 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 39 tok/s Pro
2000 character limit reached

Language Models as Knowledge Bases? (1909.01066v2)

Published 3 Sep 2019 in cs.CL

Abstract: Recent progress in pretraining LLMs on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as "fill-in-the-blank" cloze statements. LLMs have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of relations, are easy to extend to more data, and require no human supervision to train. We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained LLMs. We find that (i) without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard LLM pretraining approaches. The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. The code to reproduce our analysis is available at https://github.com/facebookresearch/LAMA.

Citations (2,405)

Summary

  • The paper shows that unsupervised language models such as BERT can implicitly store factual knowledge with high precision.
  • It employs the LAMA framework to convert triples into cloze statements and evaluates retrieval efficiency using rank-based precision metrics.
  • The analysis reveals robust performance across diverse relation types, highlighting potential for next-generation hybrid KB systems.

LLMs as Knowledge Bases?

Introduction

The paper “LLMs as Knowledge Bases?” explores the potential of contemporary LMs, particularly BERT, to serve as unsupervised knowledge repositories. These models, traditionally optimized for linguistic tasks, are assessed here for their ability to implicitly store and retrieve factual information. Given the flexibility of LMs, they offer unique benefits over structured knowledge bases (KBs), as they do not require extensive schema engineering or manual annotations. Figure 1

Figure 1: Querying knowledge bases (KB) and LLMs (LM) for factual knowledge.

Methodology

The paper employs a probing strategy leveraging the LAMA (LLM Analysis) framework, designed to evaluate the factual and commonsense knowledge encapsulated in these models. This involves converting triples and question-answer pairs into cloze statements and querying LMs by filling in masked tokens.

The evaluation spans various knowledge sources: T-REx for Wikidata triples, Google-RE for manually extracted facts from Wikipedia, ConceptNet for commonsense knowledge, and SQuAD for question-answer pairs. Performance is measured primarily with precision at rank k (P@k), considering rank-based metrics to understand the retrieval efficiency relative to traditional baselines like supervised relation extraction (RE) systems.

Analysis and Results

Initial results indicate BERT-large demonstrates notable proficiency in retrieving relational information, achieving precision levels comparable to traditional knowledge extraction systems aided by oracle-driven entity linkage. Notably, BERT performs robustly across various relation types, though certain N-to-M relations present challenges.

Further analysis reveals dependencies between model predictions and factors such as the frequency of objects within the training set, subject-object similarity, and prediction confidence. These observations underscore the latent knowledge storage capabilities of LMs. Figure 2

Figure 2: Mean P@k curve for T-REx varying k. Base-10 log scale for X axis.

Figure 3

Figure 3: Pearson correlation coefficient for the P@1 of the BERT-large model on T-REx and a set of metrics.

An intriguing aspect of the analysis is the evaluation of BERT's robustness to the phrasing variations of query inputs. Evaluation results (Figure 4) suggest that BERT and ELMo 5.5B are less sensitive to query framing, potentially indicative of their richer contextual embeddings.

Comparative Evaluation

The paper contrasts LLMs' retrieval capacities with a range of existing systems, including frequency-based and DrQA informed configurations. Despite being unsupervised, BERT achieves competitive scores, particularly in open-domain question answering, where it approaches the performance of structured knowledge systems using supervised data. Figure 4

Figure 4: Average rank distribution for 10 different mentions of 100 random facts per relation in T-REx. ELMo 5.5B and both variants of BERT are least sensitive to the framing of the query but also are the most likely to have seen the query sentence during training.

Discussion and Implications

The paper highlights the significant potential of LLMs as repositories for general and domain-specific knowledge. BERT's high rank accuracy without fine-tuning advocates for their use in applications requiring knowledge representation without the rigidity of traditional KBs. However, limitations remain in handling sparse relations and the variances in prediction associated with different cloze templates.

Future work could involve refining the models' fine-tuning strategies to enhance recall precision for less commonly encountered patterns. Moreover, the transition towards integrating structured knowledge extraction with LM embeddings could yield next-generation hybrid systems that leverage the strengths of both paradigms.

Conclusion

In conclusion, this paper demonstrates that LLMs, particularly BERT, can effectively function as substantial KBs, challenging conventional information retrieval paradigms. By shifting focus towards unsupervised knowledge representation, LMs showcase promising applications extending beyond traditional linguistic tasks, advocating for expanded research in this emergent domain.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com