SWE-Compressor: A Context-Aware Code Agent
- SWE-Compressor is a 32B parameter language model that employs the Context-as-a-Tool paradigm to manage context explicitly and prevent semantic drift.
- It integrates context compression as a decision action using the CaT-Generator, enabling proactive summarization and stable long-horizon reasoning.
- Empirical results on SWE-Bench-Verified show state-of-the-art performance with improved token efficiency and robustness on complex software tasks.
SWE-Compressor is a 32B-parameter LLM agent designed for long-horizon software engineering (SWE) tasks, distinguished by its treatment of context management as an explicit, callable tool within its decision-making architecture. Developed by Liu et al., SWE-Compressor advances the capabilities of repository-scale code agents by formalizing proactive context compression as a learnable action, enabling more stable and scalable reasoning under bounded context budgets, and achieving state-of-the-art results on SWE-Bench-Verified tasks (Liu et al., 26 Dec 2025).
1. Context Management Paradigm
SWE-Compressor is centered around the Context-as-a-Tool (Cat) paradigm. Unlike conventional agents that handle context append-only or with statically triggered compression, Cat introduces structured, tool-based context management. Each agent context at step is represented as
where is the fixed, immutable segment (system prompt and core user intent), comprises the most recent high-fidelity Thought + Action + Observation tuples, and is a condensed summary of all earlier interactions. This tripartite structure prevents uncontrolled context explosion and semantic drift by explicitly separating stable task semantics, short-term working memory, and abstracted history.
2. Context Compression as a Decision Tool
At the agent's core lies the ability to invoke the context tool as a first-class action. During each decision step, the policy network evaluates candidate actions, which include environment-modifying operations (e.g., file edits, shell commands) and the context tool invocation. The agent may choose “Invoke(ContextTool)” in scenarios such as reaching subtask milestones, surpassing raw context capacity, or anticipating that an actionable summary will facilitate future reasoning. Tool invocation triggers a structured summarization process: a new memory block is generated to replace previous historical context, preserving and . The context action is thus not an auxiliary mechanism but an integral part of the agent’s action repertoire, scored and selected alongside primary environment interactions.
3. Offline Trajectory-Level Supervision: CaT-Generator
To train context-management decision-making, SWE-Compressor leverages CaT-Generator, an offline data construction pipeline that retroactively injects context tool calls into ReAct-style reasoning trajectories. The process unfolds in two main phases:
- Base ReAct Trajectory Generation: Collects uncompressed trajectories using standard ReAct agents.
- Retroactive Injection:
- Condenser Points Identification: Selects milestones for context tool invocation based on context growth, subtask boundaries, or error-correction.
- Segment and Summarize: Segments the context and generates summaries.
- Stitch Tool Calls: Inserts
ContextTool()invocations into the action stream. - Rejection Sampling: Filters trajectories for semantic fidelity and appropriateness of tool frequency.
The final CaT SFT dataset contains approximately 20,000 high-quality supervised fine-tuning trajectories, where tool invocation and summary generation behaviors are explicitly represented.
4. Architectural Foundation and Learning
SWE-Compressor builds on the Qwen2.5-Coder-32B Transformer, retaining standard layers and FlashAttention for efficiency. No supplementary network modules are introduced; instead, its prompt format is augmented to support explicit “—Action: Invoke(Context)—” tokens and contextual summaries. Fine-tuning is performed using supervised learning, minimizing two cross-entropy objectives:
- : next-action prediction, and
- : summary generation at tool-invocation points. The total loss is ( by default).
Canonical decision flow is as follows:
1 2 3 4 5 6 7 8 9 10 |
function AgentStep(C, history, θ):
# C = (Q, M, I^k)
candidate_actions = {file_edit, bash_exec, info_retrieval, ContextTool}
scores = ModelScore(C, candidate_actions; θ)
a* = argmax scores
if a* == ContextTool:
M_new = GenerateSummary(Q, I^k, history; θ)
return UpdateContext(Q, M_new, I^k ← ∅)
else:
return ExecuteEnvironmentAction(a*) |
All tools—including execute_bash, str_replace_editor, submit, and context—are exposed during training and inference.
5. Empirical Results on SWE-Bench-Verified
SWE-Compressor has been empirically evaluated on the 500-instance SWE-Bench-Verified dataset, comprising real GitHub issue fixes with human-validated acceptance criteria. Results demonstrate a 57.6% Pass@1 solved rate, outperforming vanilla ReAct (49.8%) and static Threshold-Compression baselines (53.8%). Notably, this performance matches or exceeds closed-source agents with double the parameter count, under equivalent ReAct frameworks.
| Agent | Pass@1 % |
|---|---|
| SWE-Compressor | 57.6 |
| ReAct | 49.8 |
| Threshold-Compression | 53.8 |
Gains are especially pronounced on medium and hard tasks, corresponding to instances with solution times in the 15 min–1 h or ≥1 h range, indicating superior long-horizon context stability.
6. Context Budget, Scalability, and Qualitative Analysis
Cat’s explicit context compression ensures that the average active context size stabilizes below 32,000 tokens after approximately 100 task rounds and does not grow further, in contrast to append-only ReAct, which exhausts its context window by round 60. As trajectory length increases to 500 rounds, SWE-Compressor continues to improve performance (57.6%), while ReAct saturates and degrades.
Ablation studies show that SFT-only variants with exposed context actions but without CaT-Generator data achieve a lower 55.0% success rate at 500 steps, confirming the benefit of explicit trajectory-level context supervision. Stage-wise summarization preserves essential subgoals, historical strategies, outcomes, and persistent constraints, allowing the agent to recall salient features of its historical trajectory for long-horizon decision making.
Under a 150-step budget, SWE-Compressor uses 1.89M tokens (vs. 1.96M for ReAct), yet matches or exceeds ReAct’s pass rate. At 500 steps, it utilizes 2.75M tokens (compared to ReAct’s 2.54M) but achieves a +9 percentage point improvement in pass rate, indicating enhanced token efficiency within a scalable context-management regime.
7. Significance and Future Directions
SWE-Compressor demonstrates that proactive, learned context compression—integrated as a tool within a Transformer-based agent—enables stable, scalable, and high-fidelity reasoning on large-scale, long-horizon software engineering tasks. The Context-as-a-Tool approach affords fine-grained control over context growth, mitigates semantic drift, and supports robust decision-making, setting a new benchmark for practical code reasoning agents (Liu et al., 26 Dec 2025).
This suggests that similar context-aware paradigms could benefit other sequential decision-making domains where histories are complex, budgets constrained, and precise retention of actionable summaries required. Future directions may include the extension of context tool designs, adaptation to multi-agent settings, or integration with curriculum learning strategies to further explore context abstraction capabilities.