Trae Agent: Repository-Level Issue Resolution

Updated 2 July 2026

Trae Agent is an LLM-based, agent-oriented ensemble reasoning system that resolves repository-level software issues by searching for optimal code patches.
It employs a modular architecture with dedicated sub-agents for patch generation, hierarchical pruning, and rigorous selection to enhance accuracy and scalability.
Evaluations on the SWE-bench Verified benchmark show a significant Pass@1 improvement, with a mean uplift of 10.22 percentage points over traditional methods.

Trae Agent is an LLM-based, agent-oriented ensemble-reasoning system designed for repository-level software issue resolution. It operationalizes the problem as an optimal solution search over candidate code patches, leveraging modular sub-agents dedicated to patch candidate generation, hierarchical pruning, and rigorous selection. Trae Agent addresses the limitations of prior LLM prompting-based ensemble methods—specifically, the inability to efficiently scale ensemble search and incapacity for deep repository-level cross-file understanding—thus establishing a new state of the art on the SWE-bench Verified benchmark with a Pass@1 of 75.20% and a mean uplift of 10.22 percentage points compared to competing baselines (Team et al., 31 Jul 2025).

1. Formal Problem Setup and Solution Space

Trae Agent frames repository-level software issue resolution as an optimal solution search:

Given: a codebase $C$ , a natural-language software issue $I$ , and a test suite $T$ .
Patch generation: $N$ candidate patches are generated as $P = \{p_1, \ldots, p_N\}$ with each $p_i \sim G(C, I)$ .
Selection objective: Find $p^* \in P$ such that applying $p^*$ to $C$ maximizes the probability that $(C \oplus p^*)$ passes the test suite $I$ 0.
Evaluation Metric: $I$ 1 is defined as

$I$ 2

where $I$ 3 is the selection function and the expectation is over sampled patch ensembles.

This solution space is typically restricted by $I$ 4; scalability is managed at inference via the ensemble size.

2. Modular Agent Architecture

Trae Agent comprises three interconnected modular sub-agents:

Generation Agent ("Coder Agent"):
- Inputs: $I$ 5.
- Utilizes LLMs coupled with a toolset (file-edit, bash, sequential-thinking, done-signal) to sequentially (1) analyze the issue, (2) localize code, (3) replicate/reproduce the fault, (4) diagnose, (5) synthesize candidate patch, (6) re-test, and (7) compose commit summary.
- Operates at high sampling temperature and optionally round-robins across multiple LLM backbones (Gemini 2.5 Pro, Claude 3.7 Sonnet, GPT-4.1), yielding diverse solution candidates.
Pruning Agent:
- Deduplicates patches (strip whitespace/comments via unidiff, remove syntax/semantically-equivalent forms).
- Applies regression testing by extracting existing passing regression tests, which are further refined via an LLM-based tester agent; discards any patch failing these tests.
- Delivers a reduced ensemble $I$ 6 (typically 30% smaller), which is empirically likely to retain the correct solution.
Selection Agent ("Selector Agent"):
- Consumes $I$ 7 and operates with repository-level read-only access and a test executor.
- Implements iterative repository context enrichment: static review (import graphs, dependency mapping, context diffs), dynamic verification (on-the-fly unit test generation/execution), up to 30 prompt-tool rounds.
- Each candidate is voted for/against; early stopping is triggered at majority ( $I$ 8). Final selection is determined by majority consensus across multiple LLM voter runs.

This architecture allows cascading reductions in candidate space, precise fault localization, and robust selection in complex multi-file repositories.

3. Test-Time Scaling and Resource Adjustability

Trae Agent enables test-time scaling without model retraining:

Ensemble size $I$ 9 can be tuned for a linear increase in accuracy as $T$ 0 increases (demonstrated up to $T$ 1).
Supply diversity by adjusting the generation agent's sampling temperature and interleaving different LLM backbones.
Optionally implement beam search variants, although primary experiments use sampling-based diversity.
Empirically, prompting-only baselines show non-monotonic scaling—peaking at $T$ 2—while Trae Agent sustains monotonic accuracy gains with increasing $T$ 3.

This test-time scaling property directly translates to controllable accuracy-cost trade-offs.

4. Repository-Level Contextual Reasoning

Trae Agent’s selection agent achieves repository-level understanding through:

Cross-file static analysis using a File-Editing tool, traversing import chains and transitive dependency graphs (parsed via AST/imports).
Asynchronous summarization ("lakeview" summarizer) maintains a succinct, dynamically-updated LLM memory.
Dynamic analysis by generating and executing novel unit tests on each candidate, capturing behavioral traces.
Repetitive, structured prompt-tool interactions (up to 30 rounds) ensure deep semantic and behavioral congruence checks between patch and target issue.

This capability is central to scaling issue resolution from file-level tasks to realistic, multi-file, dependency-rich repositories.

5. Experimental Results and Ablation Studies

Evaluations were conducted on the SWE-bench Verified benchmark (500 curated GitHub issues):

Backbones: Gemini 2.5 Pro, Claude 3.7 Sonnet, GPT-4.1; default reports use Claude 3.7 Sonnet.
Metric: Pass@1 for repository-level fixes.
Key Outcomes:
- Trae Agent (N=3) achieves 66.40% Pass@1, outperforming the best ensemble baseline (Augment + Pruning, 64.33%) by +2.07 points.
- Across all baselines (including DeiBase, with/without pruning), mean uplift is +10.22 pp.
- “Mixture” mode (round-robin 3 LLMs): 65.67% (oracle upper bound: 73.40%).
- Public leaderboard: 75.20% Pass@1 (first place).
Ablations (Claude 3.7): No pruning (–6.80 pp), no deduplication (–2.73 pp), no regression test filtering (–3.32 pp), prompt-only selector (–4.00 pp), no selector voting (–2.80 pp).
Monotonicity: Pass@1 increases with $T$ 4 for Trae Agent, contrary to prompting baselines, which saturate or regress at higher $T$ 5.
Correlation: After pruning, smaller ensembles (i.e., less redundant, more curated) are highly correlated with higher Pass@1 (Pearson $T$ 6).

6. Open-Source Availability and Reproducibility

Full implementation, including multi-LLM support, tool wrappers, agent prompts, and evaluation scripts, is open-sourced at https://github.com/bytedance/trae-agent.
Containerized (Docker) scripts support end-to-end experiment re-runs and leaderboard reproduction.
Modular codebase enables external adaptation and benchmarking.

7. Significance and Implications

Trae Agent is the first system to instantiate agent-based ensemble reasoning for end-to-end, repository-level software issue resolution. Its hierarchical agent decomposition—generation, pruning, selection—enables efficient search over large solution spaces, deep repository understanding, and a test-time scaling interface, resulting in robust improvements over prior ensemble prompting frameworks. Key implications are:

The hierarchy and diversity management enable sustained accuracy benefits as ensemble size increases.
Agent-based modularization facilitates extensibility (e.g., new pruning criteria, alternative static/dynamic analyses).
Trae Agent’s method provides an empirical foundation for further systematic investigation of agentic decomposition and ensemble reasoning in software engineering tasks.

A plausible implication is that similar agent-based, modular ensemble architectures could generalize to other complex, open-ended reasoning domains requiring both semantic synthesis and behavioral validation (Team et al., 31 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Trae Agent.