Papers
Topics
Authors
Recent
Search
2000 character limit reached

Editing Anchor Compression (EAC)

Updated 7 February 2026
  • Editing Anchor Compression (EAC) is a method that controls parameter drift in sequential LLM edits by updating only a sparse, salient subspace called editing anchors.
  • It employs saliency mapping and a scored elastic net to guide anchor selection and sparse retraining, mitigating catastrophic forgetting and norm explosion.
  • Empirical tests on models like GPT2-XL and LLaMA show that EAC preserves zero-shot accuracy and maintains controlled norm drift even after hundreds of edits.

Editing Anchor Compression (EAC) is a methodological framework for efficient and robust sequential editing of LLMs. It explicitly controls parameter drift during the injection of new knowledge by confining updates to a sparse, salient subspace—referred to as “editing anchors”—thus mitigating catastrophic forgetting and norm explosion commonly observed in unconstrained editing schemes. EAC has been instantiated both at the network weight level and at the contextual level, offering a unifying principle for preserving general model abilities while supporting large-scale, multi-edit workflows (Xu et al., 25 Feb 2025, Li et al., 28 May 2025).

1. Sequential Model Editing Challenges and Motivation

LLMs increasingly rely on post hoc “model editing” to repair hallucinations and outdated knowledge without full retraining. Conventional sequential editing methods (e.g., ROME, MEMIT) achieve individual factual corrections by updating specific weight matrices, especially those encoding key–value associations in MLP layers. However, empirical studies demonstrate that the cumulative deviation WtW01\|W_t - W_0\|_1 in the edited weight matrix WtW_t grows almost monotonically with the number of edits tt, reaching >300% of its original norm after 1,000 edits (e.g., for ROME on GPT2-XL), severely corrupting non-targeted representations and degrading zero-shot accuracy across diverse tasks (Xu et al., 25 Feb 2025). This motivates explicit norm control and selective update strategies during each edit.

2. The Editing Anchor Compression (EAC) Framework

EAC decomposes the editing process at each step tt into two structured phases:

2.1 Anchor Selection via Saliency Mapping

EAC follows the ROME/MEMIT paradigm by representing edits as low-rank updates: for a factual change involving key kRdk^* \in \mathbb{R}^d and value vRdv^* \in \mathbb{R}^d, the update is expressed as

Δt=atbt\Delta_t = a_t b_t^\top

where at(vWt1k)a_t \propto (v^* - W_{t-1} k^*), bt=C1kb_t = C^{-1}k^*, and C=E[kk]C = \mathbb{E}[kk^\top]. EAC then scores each coordinate jj in vv^* via a weighted-gradient measure:

scorej=vj(v)/vj\text{score}_j = |v^*_j \, \partial \ell (v^*)/\partial v^*_j|

where (z)\ell(z) is a single-fact editing loss minimized at z=vz = v^*. A mask m{0,1}dm \in \{0,1\}^d selects coordinates jj with scorejτ\text{score}_j \geq \tau, designating them as the anchor subspace for retraining.

2.2 Sparse Retraining via Scored Elastic Net

The update is compressed such that only the anchored coordinates are changed. A diagonal anchor-selection matrix At=diag(m)A_t = \text{diag}(m) restricts zRdz \in \mathbb{R}^d by z=Atzz = A_t z. The optimization objective is:

minzRd(z)+λ1z1,a+λ2z22,subject to z=Atz\min_{z \in \mathbb{R}^d} \ell(z) + \lambda_1 \|z\|_{1,a} + \lambda_2 \|z\|_2^2, \quad \text{subject to } z = A_t z

where z1,a=j=1dajzj\|z\|_{1,a} = \sum_{j=1}^d a_j |z_j|, aj=1/(scorej+ϵ)a_j = 1/(\text{score}_j + \epsilon). This “scored elastic net” promotes sparsity and discourages updates on low-saliency dimensions, further constraining the cumulative norm drift and bounding the condition number of the updated matrix, thus preserving downstream abilities (Xu et al., 25 Feb 2025).

3. Contextual Instantiations of Editing Anchor Compression

EAC’s principles extend beyond weight-level updates to contextual editing regimes, as demonstrated in InComeS (Li et al., 28 May 2025). In the contextual (ICL-style) setting, each edit tit_i is represented as a “gist anchor” by compressing its entire effect into the key–value (KV) cache of a single specially designated token across the deeper transformer layers:

gKi=WKh<gist>1,gVi=WVh<gist>1,{L/2+1,...,L}gK_i = W^K h_{\texttt{<gist>}}^{\ell-1}, \quad gV_i = W^V h_{\texttt{<gist>}}^{\ell-1}, \quad \ell \in \{L/2+1, ..., L\}

During inference, a specialized cross-attention module enables each generation token to soft-select among NN gist anchors (plus a learned “zero-gist” anchor) according to the attention weights:

αi=exp(si)j=0Nexp(sj),si=1dkTqgKi\alpha_i = \frac{\exp(s_i)}{\sum_{j=0}^N \exp(s_j)}, \quad s_i = \frac{1}{\sqrt{d_k} T} q^\top gK_i

Cross-attention output is then linearly combined:

ocross=i=0NαigVio_\text{cross} = \sum_{i=0}^N \alpha_i gV_i

which is added to the hidden state, realizing EAC’s selection and compression principles at the context level.

4. Theoretical Properties and Norm Control

EAC’s design ensures that each compressed update satisfies

Δ~t1z1v1\|\widetilde{\Delta}_t\|_1 \leq \|z\|_1 \ll \|v^*\|_1

imposing a sub-linear growth in cumulative parameter drift with edit count (Xu et al., 25 Feb 2025). This norm control implicitly limits adverse shifts in the condition number of the edited matrices. Principal component analysis shows that hidden-state statistics under EAC remain close to the pre-edit distribution, supporting superior retention of out-of-scope (“general”) abilities.

A plausible implication is that EAC-style regularization may curb the propagation of non-local effects that typically result in performance drop-offs in unrelated downstream tasks.

5. Empirical Evaluation and Scalability

Experiments across GPT2-XL, LLaMA-3, and LLaMA-2, using ROME and MEMIT as baseline editors with and without EAC, demonstrate the following:

  • Model Preservation: EAC maintains >70% of original LLM zero-shot accuracy across QA, summarization (ROUGE-1/2/L), NLI, and sentiment after hundreds of edits, compared to <50% for unconstrained baselines.
  • Edit Reliability: For up to 300 edits, edit reliability is often >0.95 under EAC versus <0.8 for baseline.
  • Norm Drift: Cumulative 1\ell_1 drift WtW01\|W_t-W_0\|_1 remains sub-linear, e.g., <1.2×10⁶ after 1,000 edits, versus 1.4×10⁶ (ROME) or 1.0×10⁶ (MEMIT) in baselines.
  • Computational Overhead: EAC incurs <10% increase in wall-clock edit time, with anchor selection and retraining steps efficiently integrated (Xu et al., 25 Feb 2025).
  • Contextual EAC (InComeS): Achieves 12:1–15:1 compression rate (tokens:anchors), with notable improvements in batch-editing multi-hop QA, debiasing, and portability tasks, and up to 27% inference speedup over conventional ICL (Li et al., 28 May 2025).

6. Practical Considerations and Limitations

EAC’s editing reliability and preservation benefits hold across a range of model sizes (1.5B to 13B) and editing methods but have primarily been tested in decoder-only transformer architectures. In contextual EAC (InComeS), extremely long edits (>50 tokens) may exceed the capacity of a single gist anchor and require extension, while cross-attention cost may become a bottleneck when scaling to 10,000+ edits.

Other salient constraints:

  • Compression-Selection Tradeoff: The effectiveness of EAC hinges on identifying informative anchors without sacrificing the specificity of edits; mis-selection may impair factuality.
  • No Formal Error Bounds: There are no provable guarantees on compression error accumulation as the number of edits grows.
  • Zero-Gist Mechanism: Omission of the learned “zero-gist” anchor in InComeS degrades multi-hop task performance, supporting the gating role of explicit “null anchors” (Li et al., 28 May 2025).

7. Significance and Future Directions

Editing Anchor Compression formalizes a sparse, saliency-driven norm control paradigm in the sequential editing of large models, unifying direct parameter and context-editing designs. It demonstrates that efficient editing—preserving both new facts and general LLM capabilities—is achievable by confining updates to highly selective, semantically targeted parameter subspaces.

A plausible implication is that EAC may generalize as a foundational control primitive across diverse model architectures and memory editing settings. Extensions may include adaptive anchor scaling for longer edits, cross-attention optimization for large NN, and formal error tracking for accumulated edit compositions.

References:

Title arXiv ID Context
Constraining Sequential Model Editing with Editing Anchor Compression (Xu et al., 25 Feb 2025) Original EAC proposal for weight edits
InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing (Li et al., 28 May 2025) Instantiation in contextual editing

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Editing Anchor Compression (EAC).