Papers
Topics
Authors
Recent
Search
2000 character limit reached

SolAgent: Multi-Agent Smart Contract Generator

Updated 3 February 2026
  • SolAgent is a tool-augmented multi-agent framework for generating secure Solidity smart contracts by iteratively refining code using testing and static analysis.
  • It employs specialized agents in a dual-loop architecture, using Forge for functional correctness and Slither for detecting and mitigating security vulnerabilities.
  • Experiments on the SolEval+ benchmark show that SolAgent significantly improves Pass@1 scores and reduces vulnerabilities compared to baseline language models.

SolAgent is a specialized, tool-augmented multi-agent framework designed to generate secure and correct Solidity smart contracts, emulating the iterative code development workflow of human experts. By combining LLMs, program analysis tools, and file-system operations within a dual-loop architecture, SolAgent directly targets two persistent challenges in smart contract generation: functional correctness (passing all specified tests) and security (absence of vulnerabilities). Experiments on the SolEval+ benchmark demonstrate that SolAgent achieves superior performance in Pass@1 and vulnerability reduction compared to baseline LLMs, code assistants, and generic agent frameworks (Chen et al., 30 Jan 2026).

1. Multi-Agent Architecture and Core Components

SolAgent employs a division of labor across specialized agents:

  • Coding Agent: Receives natural language requirements RR and the project context, producing an initial Solidity source file C0C_0.
  • Refining Agent: Takes the latest code artifact Ct1C_{t-1} and aggregated feedback FtF_t (encompassing test and security results), then outputs a refined version CtC_t, correcting errors or mitigating detected vulnerabilities.

A dual-loop refinement mechanism orchestrates the agent interactions:

  • Inner Correctness Loop (via Forge): Utilizes the Foundry/Forge compiler and associated test harness to run comprehensive test suites, providing feedback such as test pass rates and specific assertion or stack trace failures (FforgeF_\text{forge}).
  • Outer Security Loop (via Slither): Integrates the Slither static analyzer, which reports potential vulnerabilities with per-alert severity (Low/Medium/High), guiding security-related refactoring (e.g., enforcing checks-effects-interactions or adding access controls).

File-system tools (list_dir, read_file) empower the framework to reason across repository structures, load dependencies, and manage context, enabling multi-file smart contract synthesis beyond isolated code snippets.

2. Algorithmic Workflow and Stopping Strategies

The iterative workflow dynamically alternates between correctness and security refinement, employing both success and early-stopping mechanisms:

  • Dynamic Stopping Conditions:
  1. Success: All tests pass (pass rate=1.0\text{pass rate} = 1.0), and no unresolved high or medium severity vulnerabilities remain.
  2. Stagnation: No progress in either pass rate or vulnerability count for NN consecutive rounds (default N=2N=2).
  3. Oscillation: Feedback similarity between consecutive rounds exceeds threshold τ\tau (e.g., τ=0.9\tau=0.9 using sequence matching ratio), indicating the agent is trapped in a repetitive cycle.
  • Feedback Aggregation: At each round, outputs from Forge (functionality correctness) and Slither (security inspection) are serialized and aggregated for downstream decision-making.
  • Best Solution Selection: Track the best-performing code artifact CbestC_{\text{best}} and associated score (weighted by pass rate and negative vulnerability count) across all refinement rounds.
  • Outline Pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Pseudocode summary
Input: R (requirements), T (tests), MaxRounds, N (stagnation), τ (oscillation)
Output: C_best
C0 = CodingAgent.generate(R)
C_best = C0; Score_best = -inf
for t in 1..MaxRounds:
    pass_rate, failures = RunForge(C_{t-1}, T)
    vulnerabilities = RunSlither(C_{t-1})
    Ft = aggregate(pass_rate, failures, vulnerabilities)
    if (pass_rate == 1.0 and no high/med vulns) or stagnation or oscillation:
        break
    C_t = RefiningAgent.refine(C_{t-1}, Ft)
    score_t = weight(pass_rate, -|vulns|)
    if score_t > Score_best: C_best = C_t; Score_best = score_t
return C_best

3. Metrics and Experimental Results

SolAgent is evaluated using the SolEval+ benchmark, comprising 81 file-level Solidity tasks and 1,188 hand-written Forge tests assessing key correctness and security properties.

  • Evaluation Metrics:
    • Pass@k:

    Pass@k=E[1C(nc,k)C(n,k)]\text{Pass@}k = \mathbb{E} \left[1 - \frac{C(n-c,\,k)}{C(n,\,k)} \right]

    where nn is the number of samples, cc the passing samples, and C(,)C(\cdot,\cdot) denotes the binomial coefficient. - Compile Rate:

    Ratecompile=1Ni=1N1[compile(si)]\text{Rate}_\text{compile} = \frac{1}{N} \sum_{i=1}^N \mathbb{1}[\text{compile}(s_i)] - Vulnerability Reduction:

    ΔV%=VbaseVSolAgentVbase×100\Delta V\% = \frac{V_\text{base} - V_\text{SolAgent}}{V_\text{base}} \times 100

Key experimental outcomes (Pass@1, Compile Rate):

Source / Model CompileRate Pass@1
Human Baseline Repo 100.00% 100.00%
Vanilla LLM (Claude) 39.51% 25.59%
GitHub Copilot 32.10% 10.02%
DeepCode 37.04% 13.55%
MetaGPT 35.80% 11.78%
Qwen-Agent 45.68% 28.37%
SolAgent (Claude) 95.06% 64.39%

SolAgent achieves a 127.1% relative Pass@1 improvement over the best vanilla LLM (Qwen-Agent: 28.37%). In total vulnerability reduction, SolAgent achieves 15.7% fewer static alerts compared to the human baseline (on 77 shared files: 293 vs. 247 alerts). Using a GPT-5-Mini base, a maximum reduction of 39.77% is observed (259 to 156 alerts).

Statistical variance for Pass@1 per correctly compiled file is reported (e.g. SolAgent(Claude): 0.7795±0.29410.7795 \pm 0.2941).

4. Knowledge Distillation Procedure

SolAgent's high-quality trajectories are used as distillation data to train smaller open-source models employing instruction-following and demonstration learning paradigms:

  • Trajectory Collection:

    • Full-Context: Original requirements plus detailed comments, agent dialogue, and final artifact.
    • Compressed-Context: Summarized requirements and new agent-generated dialogues.
  • Supervised Training Objective:

L(θ)=ilogPθ(Ci  R, C<i)L(\theta) = -\sum_i \log P_\theta(C^*_i~|~R,~C^*_{<i})

  • Model Variants:
    • Base: Qwen3-8B; Enhanced: Qwen3-32B
    • Tracker variants (v1: forward truncation, v2: backward truncation, 4K tokens).

Distillation results on the held-out test set:

Model CompileRate Pass@1
Qwen3-8B (base) 5.88% 0.33%
Qwen3-32B 35.29% 1.31%
tracker-v2 17.65% 1.31%

Tracker-v2 quadruples Pass@1 over the base model and matches 32B performance, demonstrating the effectiveness of agent-generated distillation data.

5. Ablation Studies and Key Insights

Ablation analyses highlight the critical contributions of SolAgent’s core components:

  • Removal of Inner Loop (Forge): Pass@1 drops from 64.39% to approximately 26% (Claude).
  • Removal of Outer Loop (Slither): Increases vulnerabilities by 25–35% (worst-case min-vuln round).
  • Exclusion of File-System Capabilities: Pass@1 reduces by approximately 7–22%, underscoring the importance of project context and cross-file reasoning.

This suggests that both functional and security feedback, as well as file-level context, are essential to high-quality smart contract generation in agent-based frameworks.

6. Limitations and Future Extensions

SolAgent currently targets single-contract files. Ongoing and future directions include extension to cross-contract systems, deeper integration of formal verification via SMT solvers or proof assistants (e.g., Coq, LiquidHaskell), and transposition of the tool-augmented multi-agent paradigm to other safety-critical domains, such as automotive and aerospace software.

7. System Schematic and Workflow Overview

The SolAgent pipeline can be represented as:

1
2
3
4
5
6
7
8
R → Coding Agent → C₀
           └────────► Refining Loop ◄────────┘
                  [Forge → test feedback]
                  [Slither → security feedback]
                  [FileSystem → context]
     dynamic stopping & best-code tracking
                                ↓
                           C_best

This workflow underpins SolAgent’s approach to robust, secure, and scalable smart contract generation (Chen et al., 30 Jan 2026). The open-source release is available at https://github.com/openpaperz/SolAgent.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SolAgent.