Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Understanding Knowledge Drift in LLMs through Misinformation (2409.07085v1)

Published 11 Sep 2024 in cs.CL and cs.LG

Abstract: LLMs have revolutionized numerous applications, making them an integral part of our digital ecosystem. However, their reliability becomes critical, especially when these models are exposed to misinformation. We primarily analyze the susceptibility of state-of-the-art LLMs to factual inaccuracies when they encounter false information in a QnA scenario, an issue that can lead to a phenomenon we refer to as knowledge drift, which significantly undermines the trustworthiness of these models. We evaluate the factuality and the uncertainty of the models' responses relying on Entropy, Perplexity, and Token Probability metrics. Our experiments reveal that an LLM's uncertainty can increase up to 56.6% when the question is answered incorrectly due to the exposure to false information. At the same time, repeated exposure to the same false information can decrease the models uncertainty again (-52.8% w.r.t. the answers on the untainted prompts), potentially manipulating the underlying model's beliefs and introducing a drift from its original knowledge. These findings provide insights into LLMs' robustness and vulnerability to adversarial inputs, paving the way for developing more reliable LLM applications across various domains. The code is available at https://github.com/afastowski/knowledge_drift.

Citations (2)

Summary

  • The paper demonstrates that repeated misinformation leads to measurable knowledge drift in LLMs, quantified by changes in entropy, perplexity, and token probabilities.
  • The methodology uses adversarial false information prompts on the TriviaQA dataset to evaluate shifts in model certainty and accuracy.
  • Findings reveal that some models suffer accuracy drops over 80%, and reinforced misinformation can cause LLMs to adopt false knowledge.

Understanding Knowledge Drift in LLMs through Misinformation

Introduction

The paper explores the vulnerability of LLMs when exposed to misinformation, specifically investigating how exposure to false information can lead to knowledge drift. This phenomenon, termed knowledge drift, refers to the deviation of a model's internal knowledge structure from its original, correct state due to manipulative inputs. The paper evaluates this effect in the context of a Question Answering (QA) setting, using state-of-the-art LLMs such as GPT-4o, GPT-3.5, LLaMA-2-13B, and Mistral-7B.

Methodology

The researchers analyze models using the TriviaQA dataset, utilizing randomness and adversarially designed false information prompts to measure shifts in model certainty and correctness. Performance metrics include entropy, perplexity, and token probability, which quantify the model's uncertainty. The experiments focus on how multiple exposures to false information can alter uncertainty levels and the accuracy of model responses. Figure 1

Figure 1: Answers produced by state-of-the-art LLMs on ``What's Rambo's first name?'' with no perturbation, with false information injection, and with random information injection.

Impact of Misinformation on LLMs

False Information Prompts

Injected false information increases model uncertainty temporarily but repeated exposure to the same misinformation can decrease this uncertainty, misleading the model into adopting these inaccuracies as its knowledge base. This is quantified by analyzing entropy and perplexity, showing a contradictory decrease in uncertainty as the misinformation is reinforced.

Random Information Prompts

Random and unrelated information generally causes higher uncertainty than related false information. This increased confusion underscores the importance of context relevance in model prompting. The random prompts do not typically lead to significant accuracy degradation because they are not designed to target specific knowledge nodes related to the QA task.

Results and Observations

The paper demonstrates that exposing LLMs to repeated misinformation compromises their internal knowledge retention, as reflected in altered uncertainty metrics and deteriorated accuracy rates. Notably, the accuracy decreases range significantly, with models like LLaMA-2-13B exhibiting an accuracy drop of over 80% when repeated misinformation is applied tenfold.

The introduction of a prompt version emphasizing truthfulness ("Respond with the true, exact answer only") yields improved resistance to misinformation, highlighting a potential mitigation strategy for model sycophancy.

Implications and Future Directions

This work highlights critical vulnerabilities in LLMs, which have implications for their deployment in sensitive domains. It underscores the need for robust defenses against adversarial attacks and suggests that uncertainty metrics alone may not be reliable indicators for detecting such threats.

Future research could focus on embedding adversarial resistance mechanisms directly into the model architecture or employing dynamic retraining strategies to counteract knowledge drift. Furthermore, a longitudinal assessment across diverse datasets and domains would extend the understanding of misinformation's effects on LLMs.

Conclusion

This research contributes to understanding how LLMs can be manipulated through misinformation, leading to knowledge drift. By exposing the fragility of current uncertainty estimation techniques, it calls for more sophisticated methods to ensure the reliability of LLM deployments in real-world applications. The results underscore the importance of model robustness in the face of deceptive inputs, paving the way for future work aimed at enhancing the resilience and trustworthiness of LLMs.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube