Sentence-Anchored Gist Compression for Long-Context LLMs

Published 11 Nov 2025 in cs.CL | (2511.08128v1)

Abstract: This work investigates context compression for LLMs using learned compression tokens to reduce the memory and computational demands of processing long sequences. We demonstrate that pre-trained LLMs can be fine-tuned to compress their context by factors of 2x to 8x without significant performance degradation, as evaluated on both short-context and long-context benchmarks. Furthermore, in experiments on a 3-billion-parameter LLaMA model, our method achieves results on par with alternative compression techniques while attaining higher compression ratios.