Knowledge Editing on Black-box Large Language Models (2402.08631v2)
Abstract: Knowledge editing (KE) aims to efficiently and precisely modify the behavior of LLMs to update specific knowledge without negatively influencing other knowledge. Current research primarily focuses on white-box LLMs editing, overlooking an important scenario: black-box LLMs editing, where LLMs are accessed through interfaces and only textual output is available. In this paper, we first officially introduce KE on black-box LLMs and then propose a comprehensive evaluation framework to overcome the limitations of existing evaluations that are not applicable to black-box LLMs editing and lack comprehensiveness. To tackle privacy leaks of editing data and style over-editing in current methods, we introduce a novel postEdit framework, resolving privacy concerns through downstream post-processing and maintaining textual style consistency via fine-grained editing to original responses. Experiments and analysis on two benchmarks demonstrate that postEdit outperforms all baselines and achieves strong generalization, especially with huge improvements on style retention (average $+20.82\%\uparrow$).
- Xiaoshuai Song (16 papers)
- Zhengyang Wang (48 papers)
- Keqing He (47 papers)
- Guanting Dong (46 papers)
- Jinxu Zhao (5 papers)
- Weiran Xu (58 papers)
- Yutao Mou (16 papers)