Papers
Topics
Authors
Recent
Search
2000 character limit reached

Still Compactor: Static Compression Frameworks

Updated 12 June 2026
  • Still compactor is a static mechanism that applies predetermined compression to system states in granular media and transformer KV caches.
  • It leverages micro-mechanical models, fixed-parameter tractability frameworks, and amortized synthesis to achieve efficient, trajectory-agnostic state reduction.
  • By ensuring high information retention and scalability, still compactors enhance performance in both materials engineering and large-scale language model inference.

A still compactor is a static compaction mechanism or algorithm—so-called to distinguish it from “on-the-fly,” dynamic, or query-informed compaction—which applies a predetermined compression, aggregation, or reduction to a system’s state without recurrent, context-dependent updates during use. The term is relevant in several contemporary research areas: principal among these are (1) the micro-mechanical compaction of mixed-rigidity granular media, and (2) key–value (KV) cache compression in transformer-based large-scale LLMs, where “still” refers to amortized, reusable, and trajectory-agnostic compaction of model state. In both settings, the “still” compactor aims to achieve near-optimal retention of essential information, high efficiency, and theoretical guarantees, while being portable and independent of iterative or instrumented runtime processes.

1. Thermomechanical Still Compaction in Granular Mixtures

The micro-mechanical model for still compaction of granular mixtures comprising rigid and highly deformable particles provides a closed-form, physically-grounded predictive framework for densification under static loading. Let RR be the fraction of deformable particles and μ\mu the inter-particle friction coefficient. For a system compacted isotropically under pressure PP, the normalized pressure–packing law reads (Cárdenas-Barrantes et al., 2020):

P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)

where:

  • EE: Young’s modulus of deformable particles,
  • ϕ\phi: packing fraction,
  • Z0(μ)Z_0(\mu): coordination number at jamming,
  • ϕ0(μ)\phi_0(\mu): initial rigid-particle packing fraction,
  • b0.14b \approx 0.14, ξ5.1\xi \approx 5.1, μ\mu0: microstructural constants,
  • μ\mu1: maximum attainable packing fraction for the mixture.

The critical parameters μ\mu2, μ\mu3, and μ\mu4 must be measured or otherwise obtained. The model derives from micro-contact statistics, the mechanics of single-particle deformation, and mean-field closure for network connectivity. As μ\mu5 the system’s bulk modulus diverges, indicating strong incompressibility at high density, a result replicable by differentiating the compaction law. This predictive theory is foundational in the engineering design and analysis of static bulk compaction in powder processing, geomechanics, and analogous materials.

2. Still Compactor Frameworks in Parameterized Counting Complexity

In the context of parameterized complexity, a “compactor” (synonym for a still compactor in this domain) is a framework for instance compression in counting problems. Given a function μ\mu6 and parameterization μ\mu7, a compactor consists of:

  • A polynomial-time computable “condenser” μ\mu8,
  • An “extractor” μ\mu9 (output reconstruction, possibly parameter-dependent),

satisfying PP0 and PP1 for some recursive size-bound PP2; polynomial-size compactors require PP3. The existence of a compactor is equivalent to fixed-parameter tractability (FPT) for PP4 (Kim et al., 2018).

A canonical result is that for any MSOL-definable, treewidth-modulable vertex-certified counting problem on PP5-topological-minor-free graphs, a polynomial-size still compactor exists with condensation time PP6, decoding time PP7, and size PP8. The construction involves:

  • t-treewidth modulator approximation (PP9-size “center”),
  • Protrusion decomposition into small-treewidth subgraphs,
  • Per-protrusion dynamic programming to precompute solution counts,
  • Enumeration and aggregation in extraction.

The framework generalizes and unifies various kernelization schemes for sparse graphs, but its limitations include dependence on treewidth-modulability and graph class sparsity. Extending still compactors to nowhere-dense classes or reducing compactor size remain open (Kim et al., 2018).

3. Still Compactors in KV Cache Compression for LLMs

The problem of excessive memory usage in long-horizon LLM inference is fundamentally determined by the size of the KV cache, which grows as P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)0, with P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)1 the context length and P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)2 the hidden dimension. The “still compactor” paradigm, exemplified by the “Still” architecture (O'Neill et al., 5 Jun 2026) and the nonparametric “Compactor” (Chari et al., 10 Jul 2025), seeks to statically reduce KV cache memory while preserving inference fidelity, without continual per-query optimization.

3.1 Still: Amortized Synthesis-Based Compactor

The Still compactor uses a small, frozen, per-layer Perceiver module to synthesize a compressed cache in a single forward pass. For each transformer layer and head, it:

  • Concatenates keys and values, transforms to a position-free frame,
  • Applies P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)3 blocks consisting of cross-attention, self-attention, and feedforward steps on a latent bank,
  • Projects shared latents to compact keys and values,
  • Attains compression ratios from P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)4 to P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)5 in P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)6k–P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)7k context windows (Qwen, Gemma).

Distinct features:

  • Amortized: Trained once per checkpoint; applies to any context without per-instance fitting,
  • Expressive: Fully synthesizes new KV representations, not limited to subset selection,
  • Iterative: Supports cascading compaction (recurring invocation), enabling true long-horizon inference.

Empirically, Still attains Pareto-optimal speed–quality trade-offs, outperforms subset-bound alternatives (H₂O, SnapKV, StreamingLLM), and surpasses per-context synthesis (Attention Matching) at scale. For instance, at 16K context and P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)8 compression, Still reaches 53.6% accuracy versus 21.0% for Attention Matching. The compressed cache suffices for both generation and summarization, sometimes exceeding baseline accuracy by 8–22% in challenging long-range settings (O'Neill et al., 5 Jun 2026).

3.2 Compactor: Calibrated Query-Agnostic Compression

The Compactor method (Chari et al., 10 Jul 2025) is parameter-free and operates without query knowledge. Its pipeline:

  1. Computes approximate leverage scores on KV matrices using randomized sketching or SVD to rank token “outlierness.”
  2. Optionally calculates non-causal attention-based token utility scores.
  3. Standardizes and linearly combines scores (with blending parameter P(ϕ,R,μ)E=bϕ2π[Z0(μ)  +  ξ[ϕϕ0(μ)]α]  ln ⁣(ϕmax(R,μ)ϕϕmax(R,μ)ϕ0(μ))\frac{P(\phi,R,\mu)}{E} = -\,\frac{b\,\phi}{2\pi}\, \Bigl[\,Z_{0}(\mu)\;+\;\xi\,[\,\phi-\phi_{0}(\mu)\,]^\alpha\Bigr] \;\ln\!\Biggl(\frac{\phi_{\max}(R,\mu)-\phi} {\phi_{\max}(R,\mu)-\phi_{0}(\mu)}\Biggr)9).
  4. Retains the top-EE0 tokens by blended score.
  5. Updates the KV cache to remove all but the most informative tokens.

In many settings (e.g., prefix cache sharing across queries), a query-agnostic still compactor is required to ensure generality and correctness. Compactor can be composed with quantization, head pruning, and other compression techniques.

Memory savings in real-world LLM deployments are on the order of 2–3EE1 at EE21% quality loss. For example, in LongBench meta-tasks, Compactor with 50% retention maintains essentially full accuracy, outperforming SnapKV and PyramidKV; context-calibrated variants achieve the same accuracy as uncompressed caches with only 37% retained tokens in zero-shot settings (Chari et al., 10 Jul 2025).

4. Iterative and Amortized Still Compaction

A key advance in the “Still” architecture is support for iterative compaction: recurrently applying the same compactor module to successive chunks of the cache as new tokens are appended. At each stage, the current cache—growing by EE3 tokens—is compacted to EE4 slots, yielding effective EE5 retained state at the end of EE6 tokens. Empirical results show that chunk size and training context critically affect the compactor’s long-horizon stability; e.g., training at EE7k context with EE8 achieves nearly 40% accuracy at EE9k contexts (O'Neill et al., 5 Jun 2026).

This iterative, amortized paradigm ensures cache memory does not grow with context, enabling scalable deployment in streaming and retrieval-augmented LLM services.

5. Comparative Evaluation and Scope

Still compactors occupy a distinct position in the taxonomy of cache/data compaction:

Method Class Optimization Type Expressivity Amortization Per-Context Fit Example
Subset selection Query-aware/heuristic Subset-only No Yes H₂O, SnapKV
Per-context synthesis Query-aware Synthesis No Yes Attention Matching
Parametric amortized synthesis Query-agnostic Synthesis Yes No Still
Nonparametric static Query-agnostic Subset-only Yes No Compactor

Subset-bound alternatives degrade at high compression, while still compactors support higher compression with minimal quality loss, at the cost of one-time module training (for Still) or score computation (for Compactor).

6. Deployment Considerations and Future Directions

Still compactor architectures are compatible with a wide range of hardware (BLAS for GEMMs), scale efficiently to large context windows (e.g. ϕ\phi0), and avoid the serving and memory challenges posed by retention of full native caches. Potential developments include training compactors on broader task mixtures, designing recurrence-aware or curriculum-based protocols for even longer horizons, and further refinement of amortized attention kernels.

Open research includes minimizing compactor size, extending methodologies to denser or less-structured graph/data classes, and establishing lower bounds on achievable compaction conditioned on permitted quality loss.

7. Significance and Open Problems

Still compaction unifies disparate theoretical and algorithmic ideas—granular mechanics, FPT/K kernelization, LLM memory architectures—under a common framework of static, reusable, and efficient compression. In parameterized counting, still compactors serve as counting analogues of kernelization, enabling explicit bounds on computational resources. In deep learning, they form the basis for scaling LLM inference to extreme context lengths at practical compute and memory budgets.

Outstanding research problems include:

  • Whether the amortized synthesis approach can cross the boundary into low-resource or unsupervised settings without quality loss,
  • Reducing or tightly bounding the size and overhead of general still compactors,
  • Generalization to broader structural classes and real-world data distributions,
  • Understanding the fundamental limitations of static compaction in adversarial or highly dynamic environments.

References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Still Compactor.