Trae Agent: LLM-Based Automated Code Issue Resolution
- Trae Agent is a modular LLM framework for automated software repository issue resolution, employing distinct generation, pruning, and selection agents.
- It leverages repository-wide context and dynamic test-time scaling to optimize candidate patch selection, achieving near-linear improvements in success metrics.
- The framework integrates multiple LLMs in an ensemble approach, delivering state-of-the-art results on benchmarks like SWE-bench Verified.
Trae Agent is a modular, LLM-based agentic framework designed for automated software repository issue resolution. It approaches the task as an optimal solution search over an intractably large space of candidate patches, leveraging agent ensembles and repository-wide context understanding to surpass previous prompting-based methods. Trae Agent’s architecture is fundamentally built around three principal agent modules—generation, pruning, selection—each executing specialized sub-procedures and interacting through a precise orchestration protocol. The test-time scaling strategy allows dynamic control over ensemble size, facilitating cost-quality tradeoff at inference and exhibiting monotonic improvement in success metrics. Within the SWE-bench Verified benchmark, Trae Agent achieves state-of-the-art results and leads the leaderboard, substantiating the efficacy of agent-based ensemble reasoning and repository-level AI understanding (Team et al., 31 Jul 2025).
1. Problem Formalization
Trae Agent formalizes repository-level issue resolution as a search problem over the space of syntactically valid patches to a codebase . Each candidate patch is evaluated via an objective function , where
with as the set of “golden” tests for a given issue . The operational goal is to find under the constraint that apply compiles and passes necessary regression tests. Due to the intractable size of , Trae Agent relies on ensemble generation—producing a finite candidate set —and selection mechanisms for tractable, scalable patch optimization.
2. Agent Architecture
The Trae Agent system is organized into three modular agents:
- Generation Agent (“Coder Agent”): Analyzes the issue, identifies relevant files, reproduces bugs, diagnoses causes, and synthesizes high-diversity candidate patches through LLM-based code generation. Sampling utilizes round-robin inference across multiple LLMs (Gemini 2.5 Pro, Claude 3.7 Sonnet, GPT-4.1), with elevated temperature for solution diversity. Only syntactically valid patches are accepted into the ensemble.
- Pruning Agent: Efficiently reduces ensemble clutter by deduplication (via AST normalization using unidiff parsing) and regression testing. Deduplication ensures semantic uniqueness of patch candidates, while regression testing prunes patches that degrade repository stability on .
- Selection Agent: Implements repository-wide reasoning combining static (file analysis, import graph traversal) and dynamic evaluation (unit test synthesis, execution trace inspection) for each candidate. It executes up to 30 interaction rounds per patch. Majority voting over selector agent runs determines the winning patch, with early termination if majority consensus is achieved.
This structure enables full repository-level context aggregation, bypassing the limitations of single-file or prompt-based approaches.
3. Ensemble Reasoning and Repository-Level Understanding
Trae Agent employs ensemble reasoning at both generation and selection stages. The multi-agent generation protocol produces diverse patches by mixing outputs from different LLMs and stochastic sampling parameters. For selection, repository-level understanding is achieved not through graph neural networks, but by iterative agentic file ingestion, code region summarization, and clustering via LLM prompts. SequentialThinkingTool directs agentic exploration of cross-file dependencies and semantic code region clusters, constrained to 30 prompt rounds to avoid context window saturation.
Pruning ensures that only non-redundant, regression-safe patches enter the selection phase. Repository-wide dynamic unit test synthesis and execution trace inspection extend static code analysis, further distinguishing Trae Agent from prompt-limited approaches.
4. Test-Time Scaling Strategy
“Test-time scaling” in Trae Agent is realized by on-the-fly adjustment of ensemble parameters during inference:
- Ensemble size : Controls the number of candidate patches and selector votes, serving as the primary scaling dial.
- Sampling temperature: Elevates solution diversity at generation.
- Inference cost-quality tradeoff: Total computation scales linearly in , with empirical gains in Pass@1 showing near-linear improvement up to , followed by diminishing returns (e.g., for , Claude 3.7 Sonnet mixture).
This dynamic scaling allows deployment scenarios to tune cost and precision according to operational constraints and desired success rates.
5. Empirical Performance
Rigorous experimental validation on SWE-bench Verified demonstrates substantial performance gains:
| Method | Gemini 2.5 Pro | Claude 3.7 Sonnet | GPT-4.1 | Mixture |
|---|---|---|---|---|
| Trae Agent | 62.27%±0.12% | 66.40%±0.20% | 59.00%±0.20% | 65.67%±0.23% |
| Best baseline | 59.27%±0.31% | 64.33%±0.42% | 56.60%±0.53% | 61.07%±0.31% |
| Improvement (Δ_avg) | +10.22% |
Trae Agent attains a state-of-the-art Pass@1 of 75.20% on SWE-bench Verified as of July 2025. The observed average improvement of approximately 10.22% over four competitive ensemble baselines is statistically significant ().
6. Implementation Details and Extensibility
- Software Package: Open-source repository available at https://github.com/bytedance/trae-agent. The architecture is divided into
generation/,pruning/, andselection/modules, with orchestration handled byrun_trae.py. - Dependencies: Compatibility with Python 3.10+, Docker SWE-bench image, and LLM APIs (
openai,anthropic,google-ai). Utilizesunidifffor patch parsing and standard testing infrastructure (pytest, bash). - Default Hyperparameters: Ensemble size (
--N 3, tunable up to 10), sampling temperature (--temp 0.8), selector rounds cap (30). - Resource Requirements: Three LLM API keys, minimum 16 GB RAM, 8 CPU cores. Typical runtime per issue is approximately 2–3 minutes at .
- Extensibility: Supports drop-in agent substitutions, custom pruning logic, and alternative voting schemes.
7. Significance and Practical Considerations
Trae Agent establishes a rigorous, modular protocol for repository-level software issue resolution using LLM-driven ensemble reasoning. This framework overcomes limitations in context window, prompt space, and ensemble diversity inherent to previous prompting-based methods. By decoupling candidate patch generation, redundancy pruning, and repository-aware selection, Trae Agent demonstrates robust advances in reliability, reproducibility, and downstream codebase integration. The monotonic quality gains allowed by test-time scaling further substantiate its adaptive utility for practical deployment in research and industry settings (Team et al., 31 Jul 2025).
A plausible implication is that widespread adoption of agent-based, repository-level reasoning architectures like Trae Agent may become standard for large-scale software engineering automation, especially as the complexity and size of codebases continue to grow.