Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly (1911.03343v3)

Published 8 Nov 2019 in cs.CL

Abstract: Building on Petroni et al. (2019), we propose two new probing tasks analyzing factual knowledge stored in Pretrained LLMs (PLMs). (1) Negation. We find that PLMs do not distinguish between negated ("Birds cannot [MASK]") and non-negated ("Birds can [MASK]") cloze questions. (2) Mispriming. Inspired by priming methods in human psychology, we add "misprimes" to cloze questions ("Talk? Birds can [MASK]"). We find that PLMs are easily distracted by misprimes. These results suggest that PLMs still have a long way to go to adequately learn human-like factual knowledge.

Analyzing the Factual Knowledge Representation in Pretrained LLMs Through Negated and Misprimed Probes

This paper presents a critical evaluation of the abilities of Pretrained LLMs (PLMs) to understand and recall factual knowledge, specifically under conditions of negation and mispriming. The paper extends upon prior work by Petroni et al., leveraging LAMA (LLM Analysis) to formulate probing tasks that challenge a PLM's capacity for nuanced comprehension. Two novel tasks are introduced: negation and mispriming, effectively testing the robustness of PLMs like Transformer-XL, ELMo, and BERT.

Summary of Findings

  1. Negation: The introduction of the "negated LAMA dataset" allows for probing the effect of negation. The results reveal a significant overlap in model predictions for positive (e.g., "Birds can fly") and negated (e.g., "Birds cannot fly") statements. Even BERT, known to handle negation comparatively well, struggles, illustrating a clear failure to differentiate between affirming and negating factual assertions. It was observed that BERT can memorize individual instances of negation if encountered during training but lacks generalization on unseen negated statements. However, finetuning improves its ability to correctly discern true and false statements, indicating that supervised learning can ameliorate some of the deficiencies observed during unsupervised pretraining.
  2. Mispriming: The application of psychological priming in a novel context for PLMs is another core contribution. The paper introduces "misprimes" (e.g., "Talk? Birds can [MASK]") to challenge BERT’s processing, simulating scenarios where human semantics would not normally be misled. Results show that PLMs are significantly misled by misprimes; BERT frequently substitutes the prime into the masked position instead of the expected value. Even increased distance between the misprime and the mask within a sentence doesn’t significantly mitigate this effect, suggesting an underlying tendency to rely on immediate contextual proximity over stored factual knowledge.

Implications and Future Directions

The findings have notable implications for the interpretation of PLMs’ performance in question-answering tasks and their potential application in real-world scenarios. The research raises questions about the robustness of inferred "knowledge" in PLMs, which might be heavily reliant on patterns and co-occurrences in training data rather than a deep understanding of semantics and logical constructs such as negation.

For practical applications, this means that there is still a significant gap between human-level understanding and the mimicry of such understanding seen in models like BERT. The paper argues this gap could hinder advancements where fine-grained and contextual comprehension is necessary. The research also calls attention to the fact that training data lacking sufficient examples of negation or varied context might lead to brittle model performance, suggesting a need for more systematic incorporation of such phenomena in training datasets.

Theoretically, the findings suggest avenues for architectural developments in PLMs that could better handle discrete phenomena like factuality and negation. Future research might need to focus on enhancing the model architecture or training paradigms to more closely simulate human levels of language understanding, possibly through hybrid approaches that integrate explicit logic or semantic reasoning capabilities with conventional deep learning techniques.

Overall, this paper contributes to a growing body of literature that challenges the surface-level proficiency of PLMs in handling complex linguistic phenomena, underscoring the necessity for continued innovation as these models are further developed for natural language understanding tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Nora Kassner (22 papers)
  2. Hinrich Schütze (250 papers)
Citations (303)
Youtube Logo Streamline Icon: https://streamlinehq.com