SWE-Exp: Experience-Driven Debugging
- SWE-Exp is an experience-driven framework that accumulates and reuses repair knowledge from past debugging trajectories to optimize issue resolution.
- It employs a dual-agent system with an Instructor for strategic planning and an Assistant for code modifications, both guided by a multi-dimensional experience bank.
- Empirical evaluations on SWE-bench-Verified demonstrate enhanced debugging performance, evidencing significant improvements in Pass@1 rates.
SWE-Exp is an experience-driven framework for automated software issue resolution that fundamentally reimagines the learning and reasoning paradigm of LLM–powered agents in software engineering. Whereas previous agentic systems for GitHub issue resolution have been predominantly memoryless—treating each problem as an isolated search task with no knowledge retention—SWE-Exp systematically accumulates and reuses actionable repair knowledge from a corpus of past agent trajectories. The core innovation is a multi-faceted experience bank, which enables agents to retrieve and apply strategic, abstracted insights from previous successes and failures, shifting the field from pure trial-and-error exploration to a paradigm of strategic, experience-driven debugging.
1. Conceptual Foundation and Motivation
SWE-Exp addresses the inefficiency of prior LLM-based agents that solve each software issue de novo, repeating fruitless exploration and failing to adapt successful repair methods to new but analogous problems. The framework recognizes that agent debugging performance is limited by a lack of cross-issue knowledge accumulation—despite repeated exposure to similar bug patterns and fix strategies. By introducing systematic knowledge distillation and retrieval from prior repair attempts, SWE-Exp aims to enable continuous learning and robust generalization across diverse software engineering tasks.
The approach is designed for repository-level issue resolution at scale, in particular as evaluated on demanding benchmarks such as SWE-bench-Verified, where agentic systems must reason about complex, real-world bugs, perform code modifications, and verify correctness against comprehensive test suites (Chen et al., 31 Jul 2025).
2. Experience Bank Architecture
At the core of SWE-Exp is a multi-dimensional experience bank, purpose-built to record, abstract, and index agent repair trajectories. Each repair attempt is stored as a trajectory—a temporally ordered sequence of tuples encapsulating:
- High-level comprehension directives (strategic observations and problem framings);
- Action primitives (specific code modifications, file or function targets);
- Observed state transitions (intermediate and final results, including test outcomes and error messages);
- Environmental feedback (runtime signals, exceptions, test results).
The distillation process produces parametrized, reusable “experiences” at multiple abstraction levels:
- Comprehension Experiences: Abstracted problem understanding, such as diagnostic patterns indicating the root cause or misinterpretation at a high level (e.g., repeated observation that input validation errors commonly involve specific argument checks).
- Modification Experiences: Generalized code editing strategies, e.g., defensive copying for mutable default arguments or canonical error handling idioms.
Each experience is embedded in a dense vector space with attached metadata (inferred issue type, repository/domain, natural language labels, and outcome tags) and stored in a vector database, enabling fast semantic retrieval via nearest-neighbor search.
This bank is constructed offline from extensive, diverse agent runs across multiple repositories and issue types to ensure broad coverage and high retrieval precision during inference.
3. Resolution Methodology: Dual-Agent and MCTS Integration
SWE-Exp employs a dual-agent architecture layered over a Monte Carlo Tree Search (MCTS) planning scheme:
- Instructor Agent: Responsible for high-level planning, it leverages retrieved comprehension experiences to shape strategic exploration—determining relevant investigation loci in the codebase, formulating hypotheses, and pruning search subspaces. The Instructor’s actions are dynamically conditioned on the most semantically similar prior comprehension experience, retrieved via the vector database.
- Assistant Agent: This agent operates at the code modification level, applying retrieved modification experiences to propose and refine patch candidates, perform targeted edits, and respond to feedback.
At each MCTS decision point, these agents jointly retrieve experience trajectories
conditioned on the current issue , thereby influencing the continued exploration in the MCTS search tree. This experience-driven conditioning fundamentally alters the exploration policy and value estimation, biasing the trajectory space toward historically successful strategies and away from previously observed dead ends.
Formally, for a new issue and current search node (state) , the selection of the next action is informed not only by traditional UCT-based utility estimates,
but also by a context vector retrieved from , incorporating the relevance of to .
4. Empirical Results and Component Ablation
Experiments on SWE-bench-Verified (using DeepSeek-V3-0324) demonstrate that SWE-Exp attains a Pass@1 rate of 41.6%, achieving state-of-the-art performance among open-source agentic frameworks at the time of publication (Chen et al., 31 Jul 2025). Ablation studies detail the contribution of individual experience facets:
- Comprehension Experiences: Their removal leads to a 3.2 percentage point drop in Pass@1 (from 41.6% to 38.4%), underlining their central importance for strategic planning.
- Modification Experiences: Excluding them results in a 2.6 point drop, indicating the benefit of recalling generalized repair tactics.
- Dual-Agent Architecture: Collapsing to a single agent further reduces performance by 2.2 points, illustrating the synergistic effect of role separation.
- Number of Experience Retrievals: Empirically, the best performance is obtained when precisely one well-selected experience per issue is used. Excessive retrieval leads to performance degradation, likely due to information overload or spurious/conflicting guidance.
The system thus achieves both quantitative improvements and a qualitative shift in agent behavior—promoting more focused exploration, earlier convergence on viable strategies, and reduced redundancy across debugging sessions.
5. Paradigm Shift and Implications
SWE-Exp represents a methodological advance over purely trial-and-error, memoryless agentic frameworks. By formalizing experience accumulation and intelligent experience reuse, it enables:
- Adaptive reasoning: Agents incrementally refine their knowledge and strategy portfolios across a growing population of issues.
- Continual learning: The experience bank progressively evolves, supporting adaptation to novel issue types and coding domains.
- Lower computational cost: Avoids repeated exploration of failed strategies and accelerates convergence on successful repairs.
A plausible implication is that as the experience bank scales in breadth and depth, SWE-Exp–style agents may ultimately approach human-like debugging efficiency—where intuition, strategy, and memory coalesce into highly adaptive automation.
6. Future Directions and Prospects
Key future research avenues include:
- Improved distillation and retrieval: Developing advanced clustering, denoising, and confidence estimation to ensure only high-quality, relevant experiences are used in each context.
- Formal verification and robustness: Integrating experience-derived strategies with formal methods or runtime validation to guard against overfitting to spurious correlations in prior data.
- Generalization: Extending SWE-Exp across languages and heterogeneous repository ecosystems to probe the universality of experience-driven repair.
- Hierarchical and meta-learning: Exploring principles from meta-reinforcement learning to enable transfer across task families and domains, further accelerating the agent’s learning curve.
In sum, SWE-Exp establishes a foundation for experience-driven, adaptive software engineering agents, marking the transition from stateless exploration to systematic expertise accumulation and reuse in automated debugging (Chen et al., 31 Jul 2025).