Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What does the Knowledge Neuron Thesis Have to do with Knowledge? (2405.02421v1)

Published 3 May 2024 in cs.CL

Abstract: We reassess the Knowledge Neuron (KN) Thesis: an interpretation of the mechanism underlying the ability of LLMs to recall facts from a training corpus. This nascent thesis proposes that facts are recalled from the training corpus through the MLP weights in a manner resembling key-value memory, implying in effect that "knowledge" is stored in the network. Furthermore, by modifying the MLP modules, one can control the LLM's generation of factual information. The plausibility of the KN thesis has been demonstrated by the success of KN-inspired model editing methods (Dai et al., 2022; Meng et al., 2022). We find that this thesis is, at best, an oversimplification. Not only have we found that we can edit the expression of certain linguistic phenomena using the same model editing methods but, through a more comprehensive evaluation, we have found that the KN thesis does not adequately explain the process of factual expression. While it is possible to argue that the MLP weights store complex patterns that are interpretable both syntactically and semantically, these patterns do not constitute "knowledge." To gain a more comprehensive understanding of the knowledge representation process, we must look beyond the MLP weights and explore recent models' complex layer structures and attention mechanisms.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jingcheng Niu (5 papers)
  2. Andrew Liu (23 papers)
  3. Zining Zhu (41 papers)
  4. Gerald Penn (6 papers)
Citations (25)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets