Uncertain side effects of localized knowledge editing in LLMs

Characterize and quantify the side effects induced by knowledge editing techniques that modify localized components of Large Language Models, given the lack of clarity about where knowledge is stored and how edits propagate within model internals.

Background

The review discusses knowledge editing approaches that attempt to modify or insert knowledge by targeting specific model components (e.g., feed-forward modules or particular weights).

Because the internal mechanisms of LLMs are opaque, it is unclear what unintended consequences these edits may have on broader model behaviors.

References

However, the side effects are unclear as the underlying LLM mechanisms still need to be clarified [79].

Towards Incremental Learning in Large Language Models: A Critical Review (2404.18311 - Jovanovic et al., 28 Apr 2024) in Section 2.1 (Continual Learning) – Knowledge Editing subsection