- The paper introduces a modular framework that decomposes code editing into a Viewer, Main Agent, and Editor to mitigate context pollution and reduce inference costs.
- It leverages GRPO reinforcement learning to adaptively choose between find-and-replace and whole-file rewrite, achieving up to 12.4% improvement in edit correctness.
- The architecture demonstrates practical gains, with a 2.1% increase in resolved issues and enhanced reliability across diverse model families by isolating context-intensive tasks.
SWE-Edit: Modularizing Code Editing for Efficient Software Engineering Agents
Motivation and Problem Statement
Contemporary LLM-based code agents have shown significant ability to navigate and resolve nontrivial software engineering (SWE) tasks. However, the standard interaction paradigmโthe tightly coupled code editing interfaceโforces code comprehension, modification reasoning, and edit execution into a single LLM context. This context coupling introduces persistent context pollution: exploratory code viewing mixes with edit formatting, which degrades both LLM accuracy and operational efficiency. The dominant find-and-replace edit format used in agentic systems is error-prone due to strict string matching requirements, while the alternative whole-file rewrite is computationally expensive and brittle for large files. These structural limitations hamper the reliability and cost-effectiveness of scalable coding agents.
SWE-Edit Framework
SWE-Edit decomposes the traditional monolithic editing interface into a three-agent scaffold: Viewer, Main Agent, and Editor. The Viewer subagent receives queries and extracts only the minimum relevant code snippets, mitigating context pollution within the main agent context window. The Editor subagent receives high-level natural language edit instructions and autonomously generates and executes the actual code changesโcrucially decoupling LLM reasoning from format-sensitive edit generation.
This architectural separation enables compositional optimization. The main agent retains a reasoning-centric role, orchestrating code navigation and edit intent, while context-intensive and format-sensitive operations (viewing and editing) are handled by dedicated subagents using smaller, cost-efficient models. This design allows specialized subagent training and efficient model selection, improving both system reliability and inference cost.
Adaptive Editor Optimization
A core insight is that no single editing format (find-and-replace vs. whole-file rewrite) is optimal for all modification types. Find-and-replace is precise and token-efficient for local changes but fails easily due to context alignment issues; whole-file rewrite is robust for structural changes but is costly and risks unintended modifications in large files.
SWE-Edit addresses this heterogeneity by formulating mode selection as a learnable policy for the Editor. The editing model (Qwen3-8B, in the principal experiments) is trained via GRPO reinforcement learning to adaptively select editing mode per instruction, with a normalized match reward (canonicalizing whitespace/comments) serving as the reward function. This learnable policy outperforms static format choice by matching editing strategies to the complexity and scope of required modifications.
Experimental Results
The framework is evaluated on the challenging SWE-bench Verified benchmark (500 real-world GitHub issues), with additional ablations on PR-Editโa new lightweight intermediate benchmark designed to predict downstream agentic effectiveness.
- Scaffolding-level improvements: SWE-Edit achieves a 2.1% absolute increase in resolved issues and a 17.9% reduction in total inference cost compared to the Anthropic-style monolithic baseline. Edit formatting reliability (edit success rate) rises by 3.5%. These gains are complementary: cost reductions stem from focused Viewer snippet extraction, while reliability gains stem from Editor decoupling and adaptive formatting.
- Editor model optimization: GRPO-trained Qwen3-8B Editors achieve a 12.4% improvement in edit correctness over their untrained counterparts on PR-Edit (as judged by GPT-4.1), and this translates to a 1.4% end-to-end agentic resolve rate improvement and 6.8% lower agent inference cost downstream.
- Cost-performance tradeoff: The framework synergistically shifts the Pareto frontier. Editor or Viewer subagent alone can improve some component-level metrics (accuracy or cost), but their combination yields joint cost and reliability benefits unattainable by either approach alone.
- Generalizability: Architectural gains persist when varying the main agent's model family. Experiments with Kimi-K2, MiniMax-M2.1, and GLM-4.7 confirm stability in resolve rate improvement (1.6%โ4.1%) and large boosts (12.8โ18.3 percentage points) in formatting reliability across models.
- Scaling and targeting: Scaling up the Editor model (GPT-5 vs. GPT-5-mini) yields minimal accuracy gains at substantial additional cost (5.8x). By contrast, adaptive training via RL produces larger improvements without the cost penalty, reinforcing that targeted format selection is a learnable policy and more efficient than pure model capacity scaling.
Implications and Future Directions
SWE-Edit supplies a modular foundation for building efficient, scalable, and reliable code editing pipelines in agentic SWE. By isolating context-intensive and format-sensitive operations, it decouples core reasoning and orchestration from lower-level context manipulation and edit execution.
Practical implications include:
- Direct deployability without retraining main agents, facilitating integration with closed-source reasoning LLMs.
- Cost-optimal subagent instantiation: small and efficiently-trained models suffice for the Viewer/Editor roles.
- Reliable and lightweight benchmarking: the PR-Edit suite provides a fast, cheap proxy for editor quality and yields strong predictive correlation with full-system agent effectiveness.
Theoretical implications:
- Highlights the necessity of architectural, not just algorithmic, advances for agentic SWEโspecialization at the cognitive level may unlock further gains.
- Demonstrates that adaptive mode selection (acquired via RL) is optimal for heterogeneous editing tasks, not model scale alone.
- Argues for compositional evaluation pipelines in future end-to-end agentic benchmarks.
Future work may focus on end-to-end reinforcement learning signals for the Editor (beyond purely static edit correctness), richer agent-to-agent collaboration feedback, and extending subagent decomposition to other operations (beyond code editing) in multi-agent SWE ecosystems.
Conclusion
SWE-Edit proposes and validates a modular code editing architecture for SWE-agents that addresses fundamental context coupling and format sensitivity bottlenecks present in monolithic LLM agents. By decomposing the Viewer and Editor roles and leveraging adaptive format selection, SWE-Edit achieves improvements in both agentic efficiency and editing reliability across multiple model families and tasks, offering a robust blueprint for future scalable software engineering agents (2604.26102).