LLMs as Knowledge Bases?
The paper "LLMs as Knowledge Bases?" by Fabio Petroni et al. investigates the potential of state-of-the-art pretrained LLMs to function as repositories of factual and commonsense knowledge. The authors analyze whether these models, without any fine-tuning, can answer queries structured as "fill-in-the-blank" cloze statements, thereby replicating or even surpassing traditional structured knowledge bases.
Background and Methodology
Recent advancements in LLMs, particularly with ELMo and BERT, have demonstrated their ability to store substantial amounts of linguistic information useful for various NLP tasks. These models are trained on vast corpora and optimized to predict missing words in a sequence. Unlike traditional knowledge bases that require complex schema engineering and extensive human annotations, LLMs offer several advantages:
- They allow querying an open class of relations without fixed schemas.
- They can be easily updated with more data.
- Their training is completely unsupervised.
To evaluate the relational knowledge embedded within these models, the authors introduce the LAMA (LLM Analysis) probe. This probe measures how well LLMs can predict masked objects in cloze statements derived from several factual and commonsense knowledge sources, including Google-RE, T-REx, ConceptNet, and a subset of SQuAD.
Key Findings
- Relational Knowledge in LLMs: The paper finds that without fine-tuning, BERT-large performs comparably to traditional NLP methods that rely on oracle knowledge and structured knowledge bases. BERT-large's ability to recall factual knowledge is remarkable, demonstrating high precision in several relations across diverse datasets.
- Comparison with Structured Knowledge Bases: The investigation shows that BERT-large can achieve performance close to that of a supervised relation extraction system with oracle-based entity linking. For instance, on the Google-RE dataset, BERT-large achieves a mean P@1 of 10.5, outperforming the RE baseline (oracle-based) which attained 7.6.
- Factual vs. Commonsense Knowledge: The paper identifies that certain types of knowledge, especially 1-to-1 relations, are more readily learned by LLMs than more complex N-to-M relations. BERT-large excels in capturing both factual and commonsense knowledge, as illustrated by its performance across the T-REx and ConceptNet datasets.
- Implications for Open-Domain QA: On a subset of cloze-style questions from the SQuAD dataset, BERT-large achieves 57.1% precision@10, compared to 63.5% of a knowledge base constructed using a task-specific supervised relation extraction system. This demonstrates BERT-large’s significant potential as an unsupervised open-domain question answering system.
Technical Insights and Implications
The correlation analysis conducted provides further technical insights into what influences the retrieval performance of knowledge:
- High frequency of object mentions in the training data and high confidence in predictions (log probability scores) positively correlate with better performance.
- The cosine similarity between subject and object embeddings also shows a positive correlation with retrieval accuracy.
These insights suggest that while LLMs are adept at memorizing and recalling training data, their performance can be further enhanced by leveraging embeddings similarity and handling tokenization effectively.
Future Directions
This research opens several avenues for future exploration:
- Template Variation and Sensitivity: Investigating the sensitivity of model performance to different query phrasings and exploring methods to standardize or optimize query templates could enhance the robustness of LLMs in knowledge retrieval.
- Multi-Token Prediction: Extending the evaluation to handle multi-token objects presents an opportunity to capture more complex expressions of knowledge.
- Scaling with Larger Corpora: Exploring how performance scales with increasingly large training corpora could clarify the potential of LLMs to serve as comprehensive knowledge repositories.
Conclusion
This paper provides a substantial contribution to understanding the extent to which modern LLMs can function as unsupervised knowledge bases. While outperforming traditional knowledge extraction methods in several respects, LLMs like BERT-large show promise for practical applications in open-domain question answering and beyond, indicating a transformative shift in how we might leverage pretrained LLMs for knowledge-centric tasks in the future.