Golden Eviction Algorithm in LLMs & Fair Division
- Golden Eviction Algorithm is an oracle-based strategy for LLMs that optimally evicts key-value pairs to minimize future attention loss.
- In fair division, it employs golden ratio thresholds to allocate indivisible goods, achieving near-optimal envy-freeness up to any good (EFX).
- By serving as a benchmark, the algorithm informs both supervised distillation and reinforcement learning approaches for efficient memory management and fair allocation.
The term "Golden Eviction Algorithm" references two distinct but independently significant concepts in the fields of neural LLM memory management and algorithmic fair division, each rigorously formalized and evaluated in recent literature. In LLMs, "Golden Eviction" denotes an oracle-based KV cache eviction strategy that computes the optimal removal of key-value (KV) pairs to minimize the degradation in future attention, a core challenge in scaling reasoning models under constrained memory budgets (Dong et al., 3 Feb 2026). Separately, "golden-eviction" describes an algorithmic paradigm for fair division of indivisible goods, achieving best-possible guarantees for envy-freeness up to any good (EFX) by exploiting the structure induced by the golden ratio (Amanatidis et al., 2019). Both have become foundational for downstream methods, serving as benchmarks and supervision or as key ingredients in their respective application domains.
1. Golden Eviction for Key-Value Cache Management in LLMs
The expansion of KV caches linearly with output sequence length in LLMs presents a severe memory bottleneck, particularly in tasks requiring long reasoning chains, with a 32K-token context for Qwen3-4B demanding 4.5 GB of memory. Traditional cache eviction schemes, such as position-based or attention-based greedy heuristics, fail to capture the long-range dependencies critical for reasoning quality, resulting in prohibitive loss when vital KV pairs are expunged. Golden Eviction formalizes an oracle solution: given access to the full generated context and attention matrices , it selects at each periodic eviction step the set of KV pairs whose removal provably minimizes the sum of future attention mass discarded, hence bounding downstream attention output error by where bounds value norm and is the evicted attention mass (Dong et al., 3 Feb 2026).
The oracle proceeds via precise block pooling of attention over queries and multi-head groups, deterministically scores each candidate’s worst-case future contribution , and selects the least-contributive keys for eviction. This algorithm’s traces supervise efficient learned eviction policies, serving as the gold standard for subsequent ranking via pairwise loss functions, and facilitating hybrid reinforcement learning refinement through MDP-structured reward design focused on low-entropy token preservation.
2. Algorithmic Formalization and Oracle Guarantees
Golden Eviction employs the following formal workflow:
- Index the full trace of generations, with cache budget and eviction interval , so eviction steps.
- At each step , aggregate per-head and per-group attention for all KV candidates , pooling across query tokens and attention heads.
- Compute the "block score" for each candidate at each step and the "future score" as the maximum importance in all remaining steps.
- Sort the candidates by : the keys with the smallest future scores are evicted.
The core property (Proposition A.1 in (Dong et al., 3 Feb 2026)) is that this process globally minimizes the total sum of discarded attention mass over all future queries relative to any alternative, thereby empirically and theoretically enabling significantly lower degradation under aggressive memory constraints.
3. Supervised Distillation and Reinforcement Learning Enhancement
By running Golden Eviction on representative long-context reasoning traces (e.g., AIME2024, AIME2025), positive (retained) and negative (evicted) examples at each step are collected for downstream distillation. A lightweight scorer (e.g., an MLP) is trained to assign scalar scores to KV tuples such that the induced ranking realizes a pairwise margin with respect to the oracle scores . The pairwise ranking loss is defined: where enforces separation of scores.
Further, the eviction process is modeled as a Markov Decision Process (MDP), with states representing KV cache contents, actions as the subset of KV pairs to retain, and rewards constructed to penalize sharp language modeling loss increases on low-entropy tokens. The GRPO algorithm is applied, initializing from the distilled policy and employing a clipped PPO-style surrogate with a KL divergence constraint to stabilize policy updates.
4. Empirical Evaluation and Impact
ForesightKV, using Golden Eviction for supervision, is evaluated on Qwen3-1.7B, Qwen3-4B, and DeepSeek-Qwen-7B over multi-thousand-token reasoning tasks (AIME2024/2025). Under constrained budgets (e.g., ), Golden Eviction achieves a loss ratio of 1.0711 versus 1.4750 for the rule-based R-KV baseline. With RL refinement, accuracy retention reaches 92% at K (Qwen3-4B, AIME2024), compared to 44.8% for R-KV at K, and throughput on 32K-token generations increases by up to 9.8× with K compared to the full cache (Dong et al., 3 Feb 2026).
These results establish Golden Eviction as a gold-standard oracle for cache eviction, and as an irreplaceable component for training data-efficient, high-performing cache eviction policies under memory constraints.
5. Golden Eviction and Envy-cycle-elimination in Fair Division
Separately, "golden-eviction" in the context of fair division denotes a family of algorithms for indivisible goods allocation, delivering optimal -EFX guarantees, where is the golden ratio (Amanatidis et al., 2019). An allocation is -EFX if for every pair of agents and every good , . The algorithm combines a preprocessing pass using golden ratio–based thresholds for agent ordering, two round-robin drafts (forward and reverse), and a global envy-cycle-elimination (ECE) phase, yielding allocations where the envy gap is at most a factor of —the theoretical optimum.
The algorithm generalizes to approximate solutions for groupwise and pairwise maximin share (GMMS, PMMS) and is provably strongly polynomial (), with special cases guaranteeing exact solutions for small numbers of goods (existence for ). These properties set a new best-known approximation frontier for multiple fairness concepts.
6. Connections, Variants, and Further Implications
In the LLM context, Golden Eviction serves strictly as an oracle—its output informs, but is not used directly in online deployment due to its requirement for future attention access. Instead, hybrid supervised and RL-trained policies, supervised by Golden Eviction traces, realize the efficiency benefits in practical settings (Dong et al., 3 Feb 2026). In allocation, the golden-eviction ECE paradigm guides both theoretical fair division bounds and practical algorithm design, with threshold and processing variants trading off between EFX, GMMS, and PMMS guarantees (Amanatidis et al., 2019).
A plausible implication is the broader applicability of golden-ratio–inspired thresholds and future-aware (oracle) loss minimization strategies across algorithmic domains where greedy, myopic decisions yield suboptimal global outcomes. Both settings demonstrate that golden-eviction serves as a bridge between theoretically optimal, globally informed choices and their practical, efficient, learnable approximations.