Language Models as Knowledge Bases? (1909.01066v2)

Published 3 Sep 2019 in cs.CL

Abstract: Recent progress in pretraining LLMs on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as "fill-in-the-blank" cloze statements. LLMs have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of relations, are easy to extend to more data, and require no human supervision to train. We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained LLMs. We find that (i) without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard LLM pretraining approaches. The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. The code to reproduce our analysis is available at https://github.com/facebookresearch/LAMA.

Citations (2,405)

View on Semantic Scholar

Summary

The paper demonstrates that BERT-large can recall factual knowledge with competitive precision compared to traditional structured methods.
The authors introduce the LAMA probe to evaluate language models using cloze-style queries from datasets like Google-RE and T-REx.
Their analysis shows that training data frequency and embedding similarities significantly enhance retrieval accuracy for open-domain question answering.

LLMs as Knowledge Bases?

The paper "LLMs as Knowledge Bases?" by Fabio Petroni et al. investigates the potential of state-of-the-art pretrained LLMs to function as repositories of factual and commonsense knowledge. The authors analyze whether these models, without any fine-tuning, can answer queries structured as "fill-in-the-blank" cloze statements, thereby replicating or even surpassing traditional structured knowledge bases.

Background and Methodology

Recent advancements in LLMs, particularly with ELMo and BERT, have demonstrated their ability to store substantial amounts of linguistic information useful for various NLP tasks. These models are trained on vast corpora and optimized to predict missing words in a sequence. Unlike traditional knowledge bases that require complex schema engineering and extensive human annotations, LLMs offer several advantages:

They allow querying an open class of relations without fixed schemas.
They can be easily updated with more data.
Their training is completely unsupervised.

To evaluate the relational knowledge embedded within these models, the authors introduce the LAMA (LLM Analysis) probe. This probe measures how well LLMs can predict masked objects in cloze statements derived from several factual and commonsense knowledge sources, including Google-RE, T-REx, ConceptNet, and a subset of SQuAD.

Key Findings

Relational Knowledge in LLMs: The paper finds that without fine-tuning, BERT-large performs comparably to traditional NLP methods that rely on oracle knowledge and structured knowledge bases. BERT-large's ability to recall factual knowledge is remarkable, demonstrating high precision in several relations across diverse datasets.
Comparison with Structured Knowledge Bases: The investigation shows that BERT-large can achieve performance close to that of a supervised relation extraction system with oracle-based entity linking. For instance, on the Google-RE dataset, BERT-large achieves a mean P@1 of 10.5, outperforming the RE baseline (oracle-based) which attained 7.6.
Factual vs. Commonsense Knowledge: The paper identifies that certain types of knowledge, especially 1-to-1 relations, are more readily learned by LLMs than more complex N-to-M relations. BERT-large excels in capturing both factual and commonsense knowledge, as illustrated by its performance across the T-REx and ConceptNet datasets.
Implications for Open-Domain QA: On a subset of cloze-style questions from the SQuAD dataset, BERT-large achieves 57.1% precision@10, compared to 63.5% of a knowledge base constructed using a task-specific supervised relation extraction system. This demonstrates BERT-large’s significant potential as an unsupervised open-domain question answering system.

Technical Insights and Implications

The correlation analysis conducted provides further technical insights into what influences the retrieval performance of knowledge:

High frequency of object mentions in the training data and high confidence in predictions (log probability scores) positively correlate with better performance.
The cosine similarity between subject and object embeddings also shows a positive correlation with retrieval accuracy.

These insights suggest that while LLMs are adept at memorizing and recalling training data, their performance can be further enhanced by leveraging embeddings similarity and handling tokenization effectively.

Future Directions

This research opens several avenues for future exploration:

Template Variation and Sensitivity: Investigating the sensitivity of model performance to different query phrasings and exploring methods to standardize or optimize query templates could enhance the robustness of LLMs in knowledge retrieval.
Multi-Token Prediction: Extending the evaluation to handle multi-token objects presents an opportunity to capture more complex expressions of knowledge.
Scaling with Larger Corpora: Exploring how performance scales with increasingly large training corpora could clarify the potential of LLMs to serve as comprehensive knowledge repositories.

Conclusion

This paper provides a substantial contribution to understanding the extent to which modern LLMs can function as unsupervised knowledge bases. While outperforming traditional knowledge extraction methods in several respects, LLMs like BERT-large show promise for practical applications in open-domain question answering and beyond, indicating a transformative shift in how we might leverage pretrained LLMs for knowledge-centric tasks in the future.

PDF Markdown

Related Papers

GitHub

GitHub - facebookresearch/LAMA: LAnguage Model Analysis (1,366 stars)

Tweets

https://twitter.com/rstone_9/status/1888962825374064839

YouTube

Show All Videos