Still Compactor: Static Compression Frameworks
- Still compactor is a static mechanism that applies predetermined compression to system states in granular media and transformer KV caches.
- It leverages micro-mechanical models, fixed-parameter tractability frameworks, and amortized synthesis to achieve efficient, trajectory-agnostic state reduction.
- By ensuring high information retention and scalability, still compactors enhance performance in both materials engineering and large-scale language model inference.
A still compactor is a static compaction mechanism or algorithm—so-called to distinguish it from “on-the-fly,” dynamic, or query-informed compaction—which applies a predetermined compression, aggregation, or reduction to a system’s state without recurrent, context-dependent updates during use. The term is relevant in several contemporary research areas: principal among these are (1) the micro-mechanical compaction of mixed-rigidity granular media, and (2) key–value (KV) cache compression in transformer-based large-scale LLMs, where “still” refers to amortized, reusable, and trajectory-agnostic compaction of model state. In both settings, the “still” compactor aims to achieve near-optimal retention of essential information, high efficiency, and theoretical guarantees, while being portable and independent of iterative or instrumented runtime processes.
1. Thermomechanical Still Compaction in Granular Mixtures
The micro-mechanical model for still compaction of granular mixtures comprising rigid and highly deformable particles provides a closed-form, physically-grounded predictive framework for densification under static loading. Let be the fraction of deformable particles and the inter-particle friction coefficient. For a system compacted isotropically under pressure , the normalized pressure–packing law reads (Cárdenas-Barrantes et al., 2020):
where:
- : Young’s modulus of deformable particles,
- : packing fraction,
- : coordination number at jamming,
- : initial rigid-particle packing fraction,
- , , 0: microstructural constants,
- 1: maximum attainable packing fraction for the mixture.
The critical parameters 2, 3, and 4 must be measured or otherwise obtained. The model derives from micro-contact statistics, the mechanics of single-particle deformation, and mean-field closure for network connectivity. As 5 the system’s bulk modulus diverges, indicating strong incompressibility at high density, a result replicable by differentiating the compaction law. This predictive theory is foundational in the engineering design and analysis of static bulk compaction in powder processing, geomechanics, and analogous materials.
2. Still Compactor Frameworks in Parameterized Counting Complexity
In the context of parameterized complexity, a “compactor” (synonym for a still compactor in this domain) is a framework for instance compression in counting problems. Given a function 6 and parameterization 7, a compactor consists of:
- A polynomial-time computable “condenser” 8,
- An “extractor” 9 (output reconstruction, possibly parameter-dependent),
satisfying 0 and 1 for some recursive size-bound 2; polynomial-size compactors require 3. The existence of a compactor is equivalent to fixed-parameter tractability (FPT) for 4 (Kim et al., 2018).
A canonical result is that for any MSOL-definable, treewidth-modulable vertex-certified counting problem on 5-topological-minor-free graphs, a polynomial-size still compactor exists with condensation time 6, decoding time 7, and size 8. The construction involves:
- t-treewidth modulator approximation (9-size “center”),
- Protrusion decomposition into small-treewidth subgraphs,
- Per-protrusion dynamic programming to precompute solution counts,
- Enumeration and aggregation in extraction.
The framework generalizes and unifies various kernelization schemes for sparse graphs, but its limitations include dependence on treewidth-modulability and graph class sparsity. Extending still compactors to nowhere-dense classes or reducing compactor size remain open (Kim et al., 2018).
3. Still Compactors in KV Cache Compression for LLMs
The problem of excessive memory usage in long-horizon LLM inference is fundamentally determined by the size of the KV cache, which grows as 0, with 1 the context length and 2 the hidden dimension. The “still compactor” paradigm, exemplified by the “Still” architecture (O'Neill et al., 5 Jun 2026) and the nonparametric “Compactor” (Chari et al., 10 Jul 2025), seeks to statically reduce KV cache memory while preserving inference fidelity, without continual per-query optimization.
3.1 Still: Amortized Synthesis-Based Compactor
The Still compactor uses a small, frozen, per-layer Perceiver module to synthesize a compressed cache in a single forward pass. For each transformer layer and head, it:
- Concatenates keys and values, transforms to a position-free frame,
- Applies 3 blocks consisting of cross-attention, self-attention, and feedforward steps on a latent bank,
- Projects shared latents to compact keys and values,
- Attains compression ratios from 4 to 5 in 6k–7k context windows (Qwen, Gemma).
Distinct features:
- Amortized: Trained once per checkpoint; applies to any context without per-instance fitting,
- Expressive: Fully synthesizes new KV representations, not limited to subset selection,
- Iterative: Supports cascading compaction (recurring invocation), enabling true long-horizon inference.
Empirically, Still attains Pareto-optimal speed–quality trade-offs, outperforms subset-bound alternatives (H₂O, SnapKV, StreamingLLM), and surpasses per-context synthesis (Attention Matching) at scale. For instance, at 16K context and 8 compression, Still reaches 53.6% accuracy versus 21.0% for Attention Matching. The compressed cache suffices for both generation and summarization, sometimes exceeding baseline accuracy by 8–22% in challenging long-range settings (O'Neill et al., 5 Jun 2026).
3.2 Compactor: Calibrated Query-Agnostic Compression
The Compactor method (Chari et al., 10 Jul 2025) is parameter-free and operates without query knowledge. Its pipeline:
- Computes approximate leverage scores on KV matrices using randomized sketching or SVD to rank token “outlierness.”
- Optionally calculates non-causal attention-based token utility scores.
- Standardizes and linearly combines scores (with blending parameter 9).
- Retains the top-0 tokens by blended score.
- Updates the KV cache to remove all but the most informative tokens.
In many settings (e.g., prefix cache sharing across queries), a query-agnostic still compactor is required to ensure generality and correctness. Compactor can be composed with quantization, head pruning, and other compression techniques.
Memory savings in real-world LLM deployments are on the order of 2–31 at 21% quality loss. For example, in LongBench meta-tasks, Compactor with 50% retention maintains essentially full accuracy, outperforming SnapKV and PyramidKV; context-calibrated variants achieve the same accuracy as uncompressed caches with only 37% retained tokens in zero-shot settings (Chari et al., 10 Jul 2025).
4. Iterative and Amortized Still Compaction
A key advance in the “Still” architecture is support for iterative compaction: recurrently applying the same compactor module to successive chunks of the cache as new tokens are appended. At each stage, the current cache—growing by 3 tokens—is compacted to 4 slots, yielding effective 5 retained state at the end of 6 tokens. Empirical results show that chunk size and training context critically affect the compactor’s long-horizon stability; e.g., training at 7k context with 8 achieves nearly 40% accuracy at 9k contexts (O'Neill et al., 5 Jun 2026).
This iterative, amortized paradigm ensures cache memory does not grow with context, enabling scalable deployment in streaming and retrieval-augmented LLM services.
5. Comparative Evaluation and Scope
Still compactors occupy a distinct position in the taxonomy of cache/data compaction:
| Method Class | Optimization Type | Expressivity | Amortization | Per-Context Fit | Example |
|---|---|---|---|---|---|
| Subset selection | Query-aware/heuristic | Subset-only | No | Yes | H₂O, SnapKV |
| Per-context synthesis | Query-aware | Synthesis | No | Yes | Attention Matching |
| Parametric amortized synthesis | Query-agnostic | Synthesis | Yes | No | Still |
| Nonparametric static | Query-agnostic | Subset-only | Yes | No | Compactor |
Subset-bound alternatives degrade at high compression, while still compactors support higher compression with minimal quality loss, at the cost of one-time module training (for Still) or score computation (for Compactor).
6. Deployment Considerations and Future Directions
Still compactor architectures are compatible with a wide range of hardware (BLAS for GEMMs), scale efficiently to large context windows (e.g. 0), and avoid the serving and memory challenges posed by retention of full native caches. Potential developments include training compactors on broader task mixtures, designing recurrence-aware or curriculum-based protocols for even longer horizons, and further refinement of amortized attention kernels.
Open research includes minimizing compactor size, extending methodologies to denser or less-structured graph/data classes, and establishing lower bounds on achievable compaction conditioned on permitted quality loss.
7. Significance and Open Problems
Still compaction unifies disparate theoretical and algorithmic ideas—granular mechanics, FPT/K kernelization, LLM memory architectures—under a common framework of static, reusable, and efficient compression. In parameterized counting, still compactors serve as counting analogues of kernelization, enabling explicit bounds on computational resources. In deep learning, they form the basis for scaling LLM inference to extreme context lengths at practical compute and memory budgets.
Outstanding research problems include:
- Whether the amortized synthesis approach can cross the boundary into low-resource or unsupervised settings without quality loss,
- Reducing or tightly bounding the size and overhead of general still compactors,
- Generalization to broader structural classes and real-world data distributions,
- Understanding the fundamental limitations of static compaction in adversarial or highly dynamic environments.
References:
- "Compaction of mixtures of rigid and highly deformable particles: a micro-mechanical model" (Cárdenas-Barrantes et al., 2020)
- "Data-compression for Parametrized Counting Problems on Sparse graphs" (Kim et al., 2018)
- "Still: Amortized KV Cache Compaction in a Single Forward Pass" (O'Neill et al., 5 Jun 2026)
- "Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores" (Chari et al., 10 Jul 2025)