Live-SWE-agent: Live Evolving Software Agent

Updated 19 November 2025

Live-SWE-agent is a continuously self-evolving software agent framework driven by LLMs, enabling dynamic runtime modifications without offline training.
It employs live self-reflection and automated code editing to extend its capabilities in response to challenges during active problem-solving.
Empirical benchmarks, such as a 75.4% solve rate on SWE-bench Verified, demonstrate its robust performance and generalization across diverse codebases.

Live-SWE-agent is a continuously self-evolving software engineering agent framework driven by LLMs. Unlike previous self-improving agents that rely on dedicated offline training and fixed scaffold designs, Live-SWE-agent dynamically evolves its own implementation “on the fly” at runtime during actual problem-solving episodes, starting from a minimal set of shell tools and explicitly modifying its own agent structure as needed. This approach eliminates the need for exhaustive manual scaffold engineering and enables autonomous adaptation to unseen challenges. Empirical results on prominent benchmarks demonstrate substantial gains in success rate compared to all prior open-source agentic systems, and strong generalization across code-bases and task types (Xia et al., 17 Nov 2025).

1. Conceptual Foundations and Prior Approaches

Live-SWE-agent belongs to the family of agentic LLM systems for automated software engineering. Traditional frameworks instantiate a scaffold with predefined toolchains, states, and sub-agent logic (e.g., SWE-agent (Yang et al., 2024), RepoForge (Chen et al., 3 Aug 2025), Kimi-Dev (Yang et al., 27 Sep 2025)). Prior efforts at self-improvement include the Darwin-Gödel Machine (DGM), which performs offline meta-learning on agent code. However, such agents require costly training, depend heavily on curated benchmarks, and exhibit limited transfer to novel tasks or LLMs.

Recent research has demonstrated code-level self-editing loops, as in the Self-Improving Coding Agent (SICA), which reflects on its failure cases, proposes concrete codebase modifications, and redispatches itself to tackle the benchmark anew—without gradient updates on the underlying LLM (Robeyns et al., 21 Apr 2025). Yet, these systems often require explicit triggers for code updates or a precomputed space of modifications. Live-SWE-agent removes these limitations by continuously evaluating and evolving its own scaffold online, without precomputed upgrades or scaffold catalogs, yielding a generic pathway for runtime self-optimization.

2. Agent Scaffold Initialization and Evolution Process

Live-SWE-agent commences with the minimal agent architecture—termed mini-SWE-agent—equipped solely with basic bash utilities. During execution on real-world tasks, the agent recursively inspects its own operational state and problem-solving progress to identify bottlenecks, inefficiencies, or missing capabilities.

Key principles:

Runtime Self-Evolution: The agent autonomously detects shortcomings and spawns code-level modifications to its own scaffold, extending tool APIs, reasoning modules, or orchestration logic while the main task remains active.
Continuous Trajectory Formation: Instead of re-launching with a new codebase, Live-SWE-agent applies scaffold upgrades in-place, incorporating new components or workflows immediately into subsequent decision steps.
Unbounded Evolution Space: The agent is not confined to a static catalog of scaffold improvements, but may generate novel code modules, hybridize prior designs, or discover emergent tool combinations.

Algorithmically, the update cycle for scaffold $S_t$ at step $t$ takes the form:

Monitor reasoning failures and tool limitations during ongoing service.
Formulate a modification proposal (code patch, new tool, altered planning routine).
Execute the scaffold upgrade within the current runtime environment.
Continue trajectory execution using the upgraded scaffold $S_{t+1}$ .

3. Autonomous Codebase Modification and Verification

Live-SWE-agent’s codebase evolution leverages a self-reflection loop akin to SICA, but is directly embedded within the live problem-solving context rather than iterating offline over benchmarks (Robeyns et al., 21 Apr 2025). Modifications occur at the granularity of source code objects—methods, classes, sub-agent interfaces, tool definitions—implemented and validated by the agent itself using available runtime resources.

Distinctive features include:

Validation During Live Execution: Each proposed code upgrade is validated on the fly with mini-integration tests, harnessing the current software problem as its benchmark.
Adaptive Tool Acquisition: The agent can autonomously integrate new tool APIs (e.g., file navigators, diff editors, contextual symbol locators) invented during the current session, immediately exploiting them in reasoning and action selection.
Non-Gradient-Based Learning: No model weights are tuned; all adaptation is performed via direct code edits, module insertion/deletion, and interface restructuring.

4. Benchmarking and Quantitative Performance

Live-SWE-agent has been empirically evaluated on the widely studied SWE-bench Verified and SWE-Bench Pro benchmarks, which each feature real-world software engineering tasks requiring end-to-end code understanding, navigation, patch generation, and test execution.

Results summary (Xia et al., 17 Nov 2025):

SWE-bench Verified: Achieves a solve rate of 75.4% without any test-time scaling—significantly higher than all prior open-source agents and close to proprietary systems.
SWE-Bench Pro: Attains best-known solve rate of 45.8%, outperforming manually designed agentic frameworks.
Generalization: The live self-evolution mechanism enables robust adaptation across codebases and benchmarks, supporting general operation even under domain shift.

5. Practical Implications and Deployment Considerations

Live-SWE-agent demonstrates the feasibility and impact of runtime self-evolution for agentic LLM systems in software engineering. Core implications include:

Reduced Agent Design Burden: Manual exploration of the scaffold design space is supplanted by automated, online scaffold search and deployment.
Task- and Context-Specific Adaptation: The agent can tailor its operational logic, tool availability, and reasoning schema to task requirements observed during runtime, supporting heterogeneous software workflows.
Continuous Improvement Pipeline: Live observation, upgrade, and exploitation cycles occur seamlessly within standard software agent deployment environments.

Deployment recommendations:

Begin with minimal shell-only or file-edit agent for safe initialization.
Enable runtime code modification permissions under controlled isolation (e.g., in a sandboxed container).
Log scaffold changes and resulting performance for auditability and offline analysis.

6. Relation to Contemporary Agentic Systems and Future Prospects

Live-SWE-agent advances beyond previous approaches such as Darwin-Gödel Machine (DGM) self-improvers and SICA-type reflection loops by supporting unbounded, live scaffold evolution. The general “agent as live-evolving software” paradigm may be extended to incorporate richer forms of user modeling (e.g., TOM-SWE for theory-of-mind (Zhou et al., 24 Oct 2025)), error correction (e.g., PRMs (Gandhi et al., 2 Sep 2025)), and persistent memory across sessions.

Future work may address:

Integration of stateful user preferences and session profiles
Coordinated multi-agent adaptation for collaborative software engineering tasks
Fine-grained process monitoring and reward modeling to further optimize long-horizon trajectories

Live-SWE-agent’s on-the-fly scaffold evolution establishes a new baseline for adaptive, robust agentic architectures in autonomous software engineering, with empirical validations on challenging and diverse benchmarks (Xia et al., 17 Nov 2025).