Papers
Topics
Authors
Recent
Search
2000 character limit reached

From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph Priors

Published 25 Apr 2026 in cs.CL and cs.AI | (2604.23277v1)

Abstract: Long-context LLMs remain computationally expensive to run and often fail to reliably process very long inputs, which makes context compression an important component of many systems. Existing compression approaches typically rely on trained compressors, dense retrieval-style selection, or heuristic trimming, and they often struggle to jointly preserve task relevance, topic coverage, and cross-sentence coherence under a strict token budget. To address this, we propose a training-free and model-agnostic compression framework that selects a compact set of sentences guided by structural graph priors. Our method constructs a sparse hybrid sentence graph that combines mutual k-NN semantic edges with short-range sequential edges, extracts a topic skeleton via clustering, and ranks sentences using an interpretable score that integrates task relevance, cluster representativeness, bridge centrality, and a cycle coverage cue. A budgeted greedy selection with redundancy suppression then produces a readable compressed context in original order. Experimental results on four datasets show that our approach is competitive with strong extractive and abstractive baselines, demonstrating larger gains on long-document benchmarks.

Summary

  • The paper introduces a training-free, model-agnostic framework for LLM context compression that leverages hybrid graph priors to retain key discourse structures.
  • It integrates semantic similarity and sequential cues through mutual k-NN, clustering, and graph metrics to maintain topic coverage and narrative flow.
  • Experimental results show superior performance on benchmarks, outperforming baselines on ROUGE, BERTScore, and QAFactEval measures.

Training-Free, Structure-Aware LLM Context Compression via Hybrid Graph Priors

Motivation and Problem Statement

Long-context LLMs, despite significant scaling improvements, remain resource-intensive and are subject to degraded performance on exceedingly long inputs. This challenges both computational scalability and semantic fidelity, motivating the development of context compression modules for downstream LLM tasks. Existing approaches often compromise between task relevance, topic coverage, and structural coherence, especially under stringent token budgets. Many rely on heavy training routines or specialized intermediate representations, limiting their practical applicability and explainability.

This work introduces a training-free, model-agnostic context compression framework that leverages both semantic similarity and sequential structure to select a compact subset of sentences. The explicit use of hybrid graph priors enables direct modeling of global and local document structure, aiming to preserve critical information chains and discourse coherence while conforming to tight token constraints. Figure 1

Figure 1: Overview of the proposed structure-aware context compression framework.

Methodology

Hybrid Sparse Sentence Graph Construction

The framework models the document at the sentence level, encoding sentences with frozen, pretrained embeddings. Two complementary edge sets define the hybrid graph:

  • Semantic Edges (Mutual kk-NN): Capture topical relationships between sentence pairs via dense vector similarity under a mutual nearest neighbor constraint, ensuring robustness and sparsity.
  • Sequential Edges: Capture immediate narrative flow by connecting consecutive sentences with exponentially decayed weights, tuned by the parameter Δ\Delta.

These signals are fused with hyperparameters α\alpha and β\beta to produce a weighted hybrid graph that balances semantic affinity with local order, maintaining sparsity to mitigate redundancy.

Topic Skeleton Extraction

Latent document structure is estimated by MiniBatch kk-means clustering of sentence embeddings, with K≈NK \approx \sqrt{N} clusters. Cluster centroids serve as topic anchors, guiding the selection toward broad topic coverage instead of local maxima in relevance.

Interpretable Sentence Scoring

Each sentence is assigned a composite score integrating:

  • Task Relevance: Cosine similarity to a query embedding (or document centroid).
  • Topic Representativeness: Salience within its cluster, incentivizing coverage.
  • Bridge Centrality (Betweenness): Frequency on shortest paths, indicating rhetorical or logical connectors, computed approximately for long documents.
  • Cycle Participation: Binary indicator of membership in enumerated graph cycles, protecting local closure and reasoning motifs.

Composite weights (λtask,λrep,λbridge,λcycle)(\lambda_{\text{task}}, \lambda_{\text{rep}}, \lambda_{\text{bridge}}, \lambda_{\text{cycle}}) linearly combine these factors, with robust performance to weight perturbations.

Budgeted Selection with Redundancy Control

A greedy, budget-aware selection iterates through sentences sorted by score. Non-maximum suppression at a cosine threshold Ï„\tau filters near-duplicates, while the selected order is restored to preserve readability and coherence.

Experimental Results

The framework is evaluated on four summarization benchmarks: CNN/DailyMail, GovReport, arXiv, and PubMed, encompassing both short-form and long-form scientific/governmental texts.

Key empirical findings:

  • The proposed compression yields consistently strong performance, with leading scores particularly on long-document benchmarks such as GovReport and arXiv, indicating improved structural preservation.
  • On GovReport and arXiv, the method outperforms all extractive, abstractive, and long-context baselines on ROUGE, BERTScore, and QAFactEval metrics.
  • On PubMed, the method achieves the best ROUGE-1/L and factual consistency, further supporting its utility in information-dense scientific domains.
  • On CNN/DailyMail, while certain rank-fusion baselines attain the highest ROUGE, the proposed approach achieves the best QAFactEval and BERTScore, emphasizing faithfulness and semantic accuracy. Figure 2

    Figure 2: Performance over token budgets on four datasets, reporting QAFactEval (CNN/DailyMail, GovReport) and ROUGE-L (arXiv, PubMed) as quality indicators.

Ablation studies show that each component—sequential edges, cluster representativeness, bridge centrality, and redundancy suppression—adds measurable benefit, with sequential and structural cues especially important for long documents. Hyperparameter sweeps confirm that a non-extreme hybridization of semantic and sequential edges delivers optimal structural integrity and information retention across domains.

Implications and Theoretical Impact

The methodology demonstrates that simple structural priors, applied without training or model-specific adaptation, can confer robustness and efficiency to context compression pipelines for LLMs. By integrating interpretable graph-theoretic signals, the approach balances task-driven salience and structural completeness, directly addressing the information fragmentation introduced by naive truncation or narrow relevance-only selection.

Practically, these findings suggest that:

  • Lightweight, explainable preprocessors—deployable on top of any LLM stack—can meet the requirements of cost-sensitive or resource-constrained deployments without degradation of task-relevant evidence, topical coverage, or logical coherence.
  • Such frameworks provide a strong baseline for future research on context compression, especially in zero-shot or domain-agnostic scenarios.

Theoretically, the explicit use of bridge and cycle cues introduces new avenues for structure-aware compression algorithms and highlights the limitations of purely semantic or positionally biased schemes. These insights could inform not only pre-processing pipelines but also the future design of long-context attention architectures and adaptive memory systems.

Conclusion

This training-free, graph-prior-guided compression method robustly compresses long documents for LLM consumption while preserving essential downstream task performance, coverage, and structure. Its plug-and-play, interpretable nature makes it broadly applicable to long-context LLM stacks and establishes a practical reference point for structurally informed context reduction approaches. Future extensions may integrate soft generative rewrites, adaptive structural weighting, or multi-hop reasoning preservation for even richer context management.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.