How Context Affects Language Models' Factual Predictions (2005.04611v1)

Published 10 May 2020 in cs.CL

Abstract: When pre-trained on large unsupervised textual corpora, LLMs are able to store and retrieve factual knowledge to some extent, making it possible to use them directly for zero-shot cloze-style question answering. However, storing factual knowledge in a fixed number of weights of a LLM clearly has limitations. Previous approaches have successfully provided access to information outside the model weights using supervised architectures that combine an information retrieval system with a machine reading component. In this paper, we go a step further and integrate information from a retrieval system with a pre-trained LLM in a purely unsupervised way. We report that augmenting pre-trained LLMs in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline. Furthermore, processing query and context with different segment tokens allows BERT to utilize its Next Sentence Prediction pre-trained classifier to determine whether the context is relevant or not, substantially improving BERT's zero-shot cloze-style question-answering performance and making its predictions robust to noisy contexts.

PDF Abstract

The Influence of Context on LLMs' Factual Predictions

The paper "How Context Affects LLMs' Factual Predictions" explores the limitations and capabilities of pre-trained LLMs (LMs), such as BERT and RoBERTa, in storing and retrieving factual knowledge without supervision. This research explores the integration of unsupervised information retrieval systems with these LLMs to enhance their zero-shot cloze-style question-answering performance.

Core Findings

The research identifies several pivotal observations:

Integration of Contexts: Augmenting pre-trained LLMs with context significantly boosts their performance in unsupervised cloze-style question answering. This is evidenced by the augmented models performing comparably to supervised baselines like DrQA, which uses a dedicated machine reading component.
Use of Retrieval Systems: Using off-the-shelf IR systems to fetch relevant contexts shows that the unsupervised model BERT is capable of matching the performance of supervised open-domain QA models. This approach leverages the LAMA probe and demonstrates BERT's machine reading capabilities even in an unsupervised setting.
Next Sentence Prediction (NSP): The paper reveals that BERT's NSP classifier, part of its pre-training strategy, is remarkably effective in filtering out noisy contexts and enhancing robustness against irrelevant data. By differentiating between the query and context with separate segment tokens, BERT uses its NSP feature to validate context relevance, thereby improving factual predictions.

Methodology and Evaluation

The researchers employed various methodologies to test the influence of context on LM predictions:

Datasets:

The paper uses the LAMA probe, composed of datasets like Google-RE, T-REx, and SQuAD, to test LMs with factual cloze-style questions. These datasets are suited for evaluating relational knowledge stored within LMs.

Comparison with Baselines:

They compared the results with DrQA, demonstrating that without any supervised fine-tuning, BERT's performance with retrieved context is on par with this well-established supervised system.

Adversarial and Retrieved Contexts:

To assess the robustness and adaptability of LMs, the paper explored the effect of adversarial contexts — contexts extracted from unrelated or noise-inducing text — versus retrieved contexts obtained via IR systems. This analysis confirmed the effectiveness of BERT's NSP in mitigating adverse impacts from unrelated contexts.

Implications

This paper provides critical insights for the NLP community, suggesting that robustly incorporating retrieval components can substantively enhance unsupervised factual question-answering capabilities of LMs. The use of NSP has broader implications, perhaps steering research towards re-evaluating strategies thought unnecessary for fine-tuning, but valuable for other tasks.

Moreover, the integration techniques explored could pave the way for developing QA systems that do not rely heavily on supervised data, thus potentially reducing biases inherent in small datasets. These methods emphasize leveraging large corpora and exploiting memory representations efficiently stored within LMs.

Future Directions

The findings stimulate several prospective research paths:

Expanding the scope of unsupervised retrieval-augmented LMs to more complex, multi-token outputs could bridge existing gaps between unsupervised and traditional supervised setups.
Further exploration could refine methods that discern context relevance beyond the limitations of NSP, especially for models without such pre-training features like RoBERTa.
This paper underscores the need to probe into mechanisms underlying LM behavior when handling noisy contexts, driving innovations in model architectures and pre-training paradigms.

In summary, this research makes a valuable contribution to understanding and enhancing how pre-trained LLMs process factual data in the absence of supervision, presenting promising pathways to evolve AI applications that capitalize on vast, diverse information repositories.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Fabio Petroni (37 papers)
Patrick Lewis (37 papers)
Aleksandra Piktus (20 papers)
Tim Rocktäschel (86 papers)
Yuxiang Wu (27 papers)
Alexander H. Miller (12 papers)
Sebastian Riedel (140 papers)

Citations (215)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/imabit_inc/status/1844434507190907050

YouTube

Show All Videos