Papers
Topics
Authors
Recent
Search
2000 character limit reached

Identifying Knowledge Editing Types in Large Language Models

Published 29 Sep 2024 in cs.CL and cs.AI | (2409.19663v3)

Abstract: Knowledge editing has emerged as an efficient technique for updating the knowledge of LLMs, attracting increasing attention in recent years. However, there is a lack of effective measures to prevent the malicious misuse of this technique, which could lead to harmful edits in LLMs. These malicious modifications could cause LLMs to generate toxic content, misleading users into inappropriate actions. In front of this risk, we introduce a new task, $\textbf{K}$nowledge $\textbf{E}$diting $\textbf{T}$ype $\textbf{I}$dentification (KETI), aimed at identifying different types of edits in LLMs, thereby providing timely alerts to users when encountering illicit edits. As part of this task, we propose KETIBench, which includes five types of harmful edits covering the most popular toxic types, as well as one benign factual edit. We develop five classical classification models and three BERT-based models as baseline identifiers for both open-source and closed-source LLMs. Our experimental results, across 92 trials involving four models and three knowledge editing methods, demonstrate that all eight baseline identifiers achieve decent identification performance, highlighting the feasibility of identifying malicious edits in LLMs. Additional analyses reveal that the performance of the identifiers is independent of the reliability of the knowledge editing methods and exhibits cross-domain generalization, enabling the identification of edits from unknown sources. All data and code are available in https://github.com/xpq-tech/KETI.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.