Anchor-guided History Compression (AHC)

Updated 8 December 2025

Anchor-guided History Compression is a method that selects crucial 'anchor' tokens to streamline processing of long histories in neural models.
It employs specialized frameworks in GUI agents, LLMs, and sequential model editing to preserve key information while reducing computational costs.
Empirical results demonstrate significant speedups and efficiency gains with reduced FLOPs and enhanced task performance.

Anchor-guided History Compression (AHC) is a principled methodology for reducing the computational burden of processing long historical contexts in neural models by leveraging selected "anchor" tokens or features that preserve essential information flow. The concept of anchor-guided compression has recently emerged in several domains, notably GUI agent policy optimization (Zhou et al., 1 Dec 2025), context representation for LLMs (Liu et al., 10 Oct 2025), and sequential model editing (Xu et al., 25 Feb 2025), where anchors act as information-preserving bottlenecks. This approach replaces naive history truncation or computationally expensive full-context attention with a targeted preservation of tokens/actions/parameters deemed crucial for downstream task fidelity.

1. Motivation and Rationale

In sequential decision-making agents and LLM applications, full historical contexts improve performance but introduce pronounced computational inefficiency. This is particularly evident in GUI navigation agents, where visual history (“2AO”) sharply elevates quadratic attention costs and memory, while also including irrelevant, distracting information (Zhou et al., 1 Dec 2025). Similar challenges exist in compressing contexts for LLM inference and in model editing, where parameter drift due to cumulative edits erodes model reliability and generalization (Liu et al., 10 Oct 2025, Xu et al., 25 Feb 2025).

Anchor-guided compression is motivated by empirical findings that a small subset of context tokens (“anchors”)—often actions, salient content tokens, or parameter coordinates—act as conduits through which critical information is fused and propagated. For example, in GUI agents, early transformer layers fuse visual and action history, but action tokens suffice to preserve influence deeper in the network; higher layers show visual tokens contribute little when action anchors remain accessible (Zhou et al., 1 Dec 2025). This suggests preserving anchor features allows aggressive pruning of historical context without impairing functional capacity.

2. Technical Frameworks

GUI Agents: Dual-Branch Anchor Compression

The HiconAgent system (Zhou et al., 1 Dec 2025) instantiates AHC via a dual-branch policy architecture:

Uncompressed branch: Processes full historical context—both past screenshots and executed actions—through all transformer layers, yielding logits $\pi_\theta(o|q)$ for action selection.
Compressed (anchor-guided) branch: After $k$ layers (typically $k=6$ ), visual tokens are dropped, leaving only action tokens (anchors) alongside the current state. Both branches share parameters and see identical training instances. The compressed branch serves downstream inference due to its efficiency.

Alignment between branches is enforced via a history-enhanced KL-divergence loss: $\mathcal{L}_{\mathrm{KL}} = \sum_{i=1}^{G}\; \mathrm{KL}\left(\pi_\theta(\cdot\,|\,q^c)\,\|\,\pi_\theta(\cdot\,|\,q)\right)$ The final objective aggregates branch-wise GRPO-style losses with the alignment term: $\mathcal{L}_{\mathrm{HCPO}} = \mathcal{L}_{\mathrm{w/o\,comp}} + \mathcal{L}_{\mathrm{w/\,comp}} + \lambda\,\mathcal{L}_{\mathrm{KL}}$ where $\lambda$ tunes alignment strength.

LLMs: Semantic Anchor Compression

Anchor-based context compression for LLMs is formulated in Semantic-Anchor Compression (SAC) (Liu et al., 10 Oct 2025), which eschews autoencoding-based compression tokens in favor of selecting real context tokens (anchors) and aggregating their key–value pairs:

Anchors are chosen either by uniform chunk splitting or more sophisticated informativeness criteria.
Anchor token embeddings are modified by adding a learnable anchor vector.
Transformer attention is adapted such that anchor positions attend bidirectionally to the full context, accumulating semantics, while non-anchor tokens remain causal.
The compressed context is the assembled KV representations of the anchors, with downstream decoding attending only to these.

This method achieves substantially better accuracy under high compression compared to autoencoding-based approaches, without pretraining misalignment.

Sequential Model Editing: Editing Anchor Compression

Editing Anchor Compression (EAC) (Xu et al., 25 Feb 2025) addresses degradation in LLM generality after repeated factual edits:

Locates a rank-one parameter update $k_t v_t^\top$ encoding the new fact.
Selects a sparse anchor mask $m$ via the relevance-weighted saliency $|(\partial C/\partial v_t)_i \cdot (v_t)_i|$ , which restricts editing to high-impact coordinates.
Refines anchors via an elastic-net objective

$L_r(z) = C(z) + \alpha\sum_i a_i |z_i| + \beta\|z\|_2^2$

Sequential edits are thus constrained to a low-dimensional anchor subspace, bounding the norm $\|W^{(t)}-W^{(0)}\|_F$ and preserving general capabilities.

3. Algorithmic Process and Pseudocode

The essential workflow for anchor-guided history compression follows a joint training protocol:

for each update step:
    # Dynamic Context Sampling
    sample rollout contexts q_i with variable history length
    sample actions and rewards

    # Forward pass
    for each context q_i:
        logits_full = model(q_i)           # Uncompressed branch
        logits_comp = model(q_c_i)         # Compressed branch (anchors only after layer k)

    # Compute losses
    loss_full  = GRPO_loss(logits_full)
    loss_comp  = GRPO_loss(logits_comp)
    loss_align = KL(logits_comp, logits_full)
    total_loss = loss_full + loss_comp + lambda * loss_align

    # Update parameters
    optimizer.step(total_loss)

Analogous procedures exist for SAC (anchor token selection + attention masking) and EAC (anchor mask selection + elastic-net refinement) (Liu et al., 10 Oct 2025, Xu et al., 25 Feb 2025).

4. Empirical Outcomes and Efficiency Gains

Empirical studies demonstrate that anchor-guided compression—when retaining action or content anchors—achieves substantial computational reductions with minimal or improved performance loss:

In HiconAgent-3B (Zhou et al., 1 Dec 2025), switching to AHC with $k=6$ achieves up to $2.47\times$ inference speedup and $60\%$ FLOPs reduction, while outperforming larger models (GUI-R1-7B) by $+8.46\%$ grounding accuracy and $+11.32\%$ step success rate.
SAC (Liu et al., 10 Oct 2025) reduces KV cache by $80\%$ and increases decoding speed by roughly $4\times$ at $5\times$ compression, with absolute EM score advantages over autoencoding baselines at increasing ratios (see results table below).

Compression Ratio	ICAE EM	EPL EM	SAC EM
r=5 (in-domain)	22.12	46.33	44.83
r=51 (out-domain)	17.85	19.48	26.86

In EAC (Xu et al., 25 Feb 2025), after $300$ sequential edits, ROME+EAC preserves $70-75\%$ general-task accuracy against $25-30\%$ for ROME alone, with strong gains in reliability and locality.

5. Anchor Selection and Architectural Details

Anchor identification is context-specific:

In policy networks for GUI agents, actions are preserved as anchors based on layerwise probing of token fusion (Zhou et al., 1 Dec 2025).
In SAC, anchors are selected per chunk of input sequence, with additional informativeness scoring as an option (Liu et al., 10 Oct 2025); anchor tokens receive a fixed embedding addition and enable bidirectional attention.
In EAC, anchor coordinates are chosen by saliency percentile (typically the top $10-25\%$ ), optimizing the anchor threshold for edit success and minimal deviation (Xu et al., 25 Feb 2025).

Architecturally, drop-out of non-anchor historical tokens occurs at a configurable transformer layer $k$ (HiconAgent), or, for LLMs, via modification of attention masks and embedding layers (SAC). No new tokens or modules are introduced; methods are parameter-shared and often compatible with backbone architectures via adapter training.

6. Practical Implementation and Guidelines

Key practical insights include:

Drop layer $k$ for token pruning (e.g., $k=6$ in HiconAgent) should be chosen by FLOPs/accuracy trade-off curves.
Anchor embedding dimension, LoRA adapter weights, and hybrid attention masks require tuning for SAC (Liu et al., 10 Oct 2025).
Anchor threshold in EAC is set at high saliency percentile for effective sparse updates, with elastic-net coefficients $\alpha,\beta$ tuned per model (Xu et al., 25 Feb 2025).
Computing overhead for anchor selection and alignment is minimal relative to overall inference/training cost.

A plausible implication is that anchor-guided history compression generalizes across domains, applicable wherever historical context is essential but computationally burdensome, and that further research may refine anchor selection criteria, optimal pruning layer, and integration into diverse architectures.

7. Significance and Limitations

Anchor-guided History Compression offers a judicious trade-off between efficiency and representational adequacy, enabling substantial computational savings and superior sample efficiency. The principal advantage lies in its targeted preservation of information flow via anchors, sidestepping reconstruction loss misalignment observed in autoencoding-based compressors. Empirical evidence supports its robustness, especially in out-of-distribution and sequential edit settings. Limitations may arise if anchor selection fails to capture latent dependencies, or if compressed contexts omit unforeseen salient cues.

AHC, SAC, and EAC collectively advance anchor-based compression as an efficient and reliable technique for context management in neural agents, LLM compression, and continual model editing (Zhou et al., 1 Dec 2025, Liu et al., 10 Oct 2025, Xu et al., 25 Feb 2025).