Papers
Topics
Authors
Recent
2000 character limit reached

Trae Agent: LLM-Based Automated Code Issue Resolution

Updated 6 January 2026
  • Trae Agent is a modular LLM framework for automated software repository issue resolution, employing distinct generation, pruning, and selection agents.
  • It leverages repository-wide context and dynamic test-time scaling to optimize candidate patch selection, achieving near-linear improvements in success metrics.
  • The framework integrates multiple LLMs in an ensemble approach, delivering state-of-the-art results on benchmarks like SWE-bench Verified.

Trae Agent is a modular, LLM-based agentic framework designed for automated software repository issue resolution. It approaches the task as an optimal solution search over an intractably large space of candidate patches, leveraging agent ensembles and repository-wide context understanding to surpass previous prompting-based methods. Trae Agent’s architecture is fundamentally built around three principal agent modules—generation, pruning, selection—each executing specialized sub-procedures and interacting through a precise orchestration protocol. The test-time scaling strategy allows dynamic control over ensemble size, facilitating cost-quality tradeoff at inference and exhibiting monotonic improvement in success metrics. Within the SWE-bench Verified benchmark, Trae Agent achieves state-of-the-art results and leads the leaderboard, substantiating the efficacy of agent-based ensemble reasoning and repository-level AI understanding (Team et al., 31 Jul 2025).

1. Problem Formalization

Trae Agent formalizes repository-level issue resolution as a search problem over the space S\mathcal{S} of syntactically valid patches to a codebase CC. Each candidate patch pSp \in \mathcal{S} is evaluated via an objective function f(p)f(p), where

f ⁣:SR,f(p)  =  tT1[apply(C,p) passes test t],f\colon \mathcal{S}\to \mathbb{R},\quad f(p)\;=\;\sum_{t\in T}\mathbf{1}\bigl[\text{apply}(C,p)\text{ passes test }t\bigr],

with TT as the set of “golden” tests for a given issue II. The operational goal is to find p=argmaxpSf(p)p^* = \arg\max_{p \in \mathcal{S}} f(p) under the constraint that apply(C,p)(C, p) compiles and passes necessary regression tests. Due to the intractable size of S\mathcal{S}, Trae Agent relies on ensemble generation—producing a finite candidate set P={p1,...,pN}P = \{p_1, ..., p_N\}—and selection mechanisms for tractable, scalable patch optimization.

2. Agent Architecture

The Trae Agent system is organized into three modular agents:

  • Generation Agent (“Coder Agent”): Analyzes the issue, identifies relevant files, reproduces bugs, diagnoses causes, and synthesizes high-diversity candidate patches through LLM-based code generation. Sampling utilizes round-robin inference across multiple LLMs (Gemini 2.5 Pro, Claude 3.7 Sonnet, GPT-4.1), with elevated temperature for solution diversity. Only syntactically valid patches are accepted into the ensemble.
  • Pruning Agent: Efficiently reduces ensemble clutter by deduplication (via AST normalization using unidiff parsing) and regression testing. Deduplication ensures semantic uniqueness of patch candidates, while regression testing prunes patches that degrade repository stability on RTorigR \subseteq T_\mathrm{orig}.
  • Selection Agent: Implements repository-wide reasoning combining static (file analysis, import graph traversal) and dynamic evaluation (unit test synthesis, execution trace inspection) for each candidate. It executes up to 30 interaction rounds per patch. Majority voting over selector agent runs determines the winning patch, with early termination if majority consensus is achieved.

This structure enables full repository-level context aggregation, bypassing the limitations of single-file or prompt-based approaches.

3. Ensemble Reasoning and Repository-Level Understanding

Trae Agent employs ensemble reasoning at both generation and selection stages. The multi-agent generation protocol produces diverse patches by mixing outputs from different LLMs and stochastic sampling parameters. For selection, repository-level understanding is achieved not through graph neural networks, but by iterative agentic file ingestion, code region summarization, and clustering via LLM prompts. SequentialThinkingTool directs agentic exploration of cross-file dependencies and semantic code region clusters, constrained to 30 prompt rounds to avoid context window saturation.

Pruning ensures that only non-redundant, regression-safe patches enter the selection phase. Repository-wide dynamic unit test synthesis and execution trace inspection extend static code analysis, further distinguishing Trae Agent from prompt-limited approaches.

4. Test-Time Scaling Strategy

Test-time scaling” in Trae Agent is realized by on-the-fly adjustment of ensemble parameters during inference:

  • Ensemble size NN: Controls the number of candidate patches and selector votes, serving as the primary scaling dial.
  • Sampling temperature: Elevates solution diversity at generation.
  • Inference cost-quality tradeoff: Total computation scales linearly in NN, with empirical gains in Pass@1 showing near-linear improvement up to N10N\approx 10, followed by diminishing returns (e.g., Pass1(N)57.9%+0.75%×N\mathrm{Pass1}(N) \approx 57.9\% + 0.75\% \times N for 1N101 \le N \le 10, Claude 3.7 Sonnet mixture).

This dynamic scaling allows deployment scenarios to tune cost and precision according to operational constraints and desired success rates.

5. Empirical Performance

Rigorous experimental validation on SWE-bench Verified demonstrates substantial performance gains:

Method Gemini 2.5 Pro Claude 3.7 Sonnet GPT-4.1 Mixture
Trae Agent 62.27%±0.12% 66.40%±0.20% 59.00%±0.20% 65.67%±0.23%
Best baseline 59.27%±0.31% 64.33%±0.42% 56.60%±0.53% 61.07%±0.31%
Improvement (Δ_avg) +10.22%

Trae Agent attains a state-of-the-art Pass@1 of 75.20% on SWE-bench Verified as of July 2025. The observed average improvement of approximately 10.22% over four competitive ensemble baselines is statistically significant (p<8×106p<8\times10^{-6}).

6. Implementation Details and Extensibility

  • Software Package: Open-source repository available at https://github.com/bytedance/trae-agent. The architecture is divided into generation/, pruning/, and selection/ modules, with orchestration handled by run_trae.py.
  • Dependencies: Compatibility with Python 3.10+, Docker SWE-bench image, and LLM APIs (openai, anthropic, google-ai). Utilizes unidiff for patch parsing and standard testing infrastructure (pytest, bash).
  • Default Hyperparameters: Ensemble size (--N 3, tunable up to 10), sampling temperature (--temp 0.8), selector rounds cap (30).
  • Resource Requirements: Three LLM API keys, minimum 16 GB RAM, 8 CPU cores. Typical runtime per issue is approximately 2–3 minutes at N=3N=3.
  • Extensibility: Supports drop-in agent substitutions, custom pruning logic, and alternative voting schemes.

7. Significance and Practical Considerations

Trae Agent establishes a rigorous, modular protocol for repository-level software issue resolution using LLM-driven ensemble reasoning. This framework overcomes limitations in context window, prompt space, and ensemble diversity inherent to previous prompting-based methods. By decoupling candidate patch generation, redundancy pruning, and repository-aware selection, Trae Agent demonstrates robust advances in reliability, reproducibility, and downstream codebase integration. The monotonic quality gains allowed by test-time scaling further substantiate its adaptive utility for practical deployment in research and industry settings (Team et al., 31 Jul 2025).

A plausible implication is that widespread adoption of agent-based, repository-level reasoning architectures like Trae Agent may become standard for large-scale software engineering automation, especially as the complexity and size of codebases continue to grow.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Trae-agent.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube