Disentangling Knowledge Representations for Large Language Model Editing

Published 24 May 2025 in cs.CL | (2505.18774v1)

Abstract: Knowledge Editing has emerged as a promising solution for efficiently updating embedded knowledge in LLMs. While existing approaches demonstrate effectiveness in integrating new knowledge and preserving the original capabilities of LLMs, they fail to maintain fine-grained irrelevant knowledge facts that share the same subject as edited knowledge but differ in relation and object. This challenge arises because subject representations inherently encode multiple attributes, causing the target and fine-grained irrelevant knowledge to become entangled in the representation space, and thus vulnerable to unintended alterations during editing. To address this, we propose DiKE, a novel approach that Disentangles Knowledge representations for LLM Editing (DiKE). DiKE consists of two key components: a Knowledge Representation Disentanglement (KRD) module that decomposes the subject representation into target-knowledgerelated and -unrelated components, and a Disentanglement-based Knowledge Edit (DKE) module that updates only the target-related component while explicitly preserving the unrelated one. We further derive a closed-form, rank-one parameter update based on matrix theory to enable efficient and minimally invasive edits. To rigorously evaluate fine-grained irrelevant knowledge preservation, we construct FINE-KED, a new benchmark comprising fine-grained irrelevant knowledge at different levels of relational similarity to the edited knowledge. Extensive experiments across multiple LLMs demonstrate that DiKE substantially improves fine-grained irrelevant knowledge preservation while maintaining competitive general editing performance.

Abstract PDF Upgrade to Chat

Summary

The paper introduces DiKE, a framework that disentangles LLM knowledge representations to enable precise editing that preserves fine-grained, irrelevant information while integrating new facts.
DiKE utilizes Knowledge Representation Disentanglement (KRD) and a novel closed-form rank-one parameter update strategy derived from matrix theory for efficient and minimally invasive edits.
Evaluated using the new FINE-KED benchmark, DiKE significantly enhances the preservation of fine-grained irrelevant knowledge across models like GPT2-XL, GPT-J, and LLaMA-3, maintaining competitive general editing performance.

Disentangling Knowledge Representations for LLM Editing

The paper introduces a novel approach to knowledge editing in LLMs by disentangling knowledge representations, which is critical for preserving fine-grained, irrelevant knowledge while integrating new information. This study addresses a significant challenge in the field—avoiding unintended alterations of unrelated knowledge during the editing process.

Core Contributions

Disentanglement of Knowledge Representations: The authors propose DiKE, a framework consisting of two essential modules—Knowledge Representation Disentanglement (KRD) and Disentanglement-based Knowledge Edit (DKE). The KRD module separates the subject representation into target-related and target-unrelated components. This disentanglement aims to isolate the specific attributes linked to the knowledge being edited, allowing for precise updates without affecting other fine-grained knowledge.
Efficient Parameter Update Strategy: Building upon matrix theory, the authors derive a closed-form rank-one parameter update formula. This method facilitates efficient, minimally invasive edits, ensuring only the relevant component of the subject representation is updated while preserving unrelated knowledge.
Benchmark Construction: To rigorously evaluate the preservation of fine-grained irrelevant knowledge, the authors develop FINE-KED, a benchmark dataset designed specifically for measuring relational similarity to edited knowledge. This benchmark includes test instances at varying levels of relational similarity, providing a comprehensive evaluation of the proposed method against existing approaches.

Experimental Evaluation

The empirical results demonstrate that DiKE significantly enhances the preservation of fine-grained irrelevant knowledge across multiple LLMs, while maintaining competitive general editing performance. Experiments conducted on models including GPT2-XL, GPT-J, and LLaMA-3 reveal DiKE's capability to efficiently edit model parameters and achieve high efficacy scores without degrading unrelated facts. Moreover, the study offers insights into practical applications by validating the performance under structured knowledge editing tasks and subject-consistent batch editing scenarios, emphasizing the effectiveness of disentangled representations.

Implications and Future Directions

The implications of this research are twofold: practically, DiKE can improve the reliability and adaptability of LLMs in dynamic settings, where continuous updates are required without compromising existing capabilities. Theoretically, this work advances our understanding of knowledge representation within neural networks, highlighting the importance of disentangling internal representations to mitigate negative ripple effects during model editing.

Future research could explore expanding the disentanglement framework to accommodate unstructured or semi-structured knowledge formats, further broadening the applicability of DiKE in diverse editing contexts. Additionally, integrating the disentanglement approach with other model editing techniques may offer enhanced solutions that combine the strengths of different paradigms.

Ultimately, the paper makes substantial contributions to the field of AI by addressing key challenges in LLM knowledge editing, offering a solid foundation for subsequent advancements in model reliability and efficiency.

Markdown