SolAgent: Multi-Agent Smart Contract Generator
- SolAgent is a tool-augmented multi-agent framework for generating secure Solidity smart contracts by iteratively refining code using testing and static analysis.
- It employs specialized agents in a dual-loop architecture, using Forge for functional correctness and Slither for detecting and mitigating security vulnerabilities.
- Experiments on the SolEval+ benchmark show that SolAgent significantly improves Pass@1 scores and reduces vulnerabilities compared to baseline language models.
SolAgent is a specialized, tool-augmented multi-agent framework designed to generate secure and correct Solidity smart contracts, emulating the iterative code development workflow of human experts. By combining LLMs, program analysis tools, and file-system operations within a dual-loop architecture, SolAgent directly targets two persistent challenges in smart contract generation: functional correctness (passing all specified tests) and security (absence of vulnerabilities). Experiments on the SolEval+ benchmark demonstrate that SolAgent achieves superior performance in Pass@1 and vulnerability reduction compared to baseline LLMs, code assistants, and generic agent frameworks (Chen et al., 30 Jan 2026).
1. Multi-Agent Architecture and Core Components
SolAgent employs a division of labor across specialized agents:
- Coding Agent: Receives natural language requirements and the project context, producing an initial Solidity source file .
- Refining Agent: Takes the latest code artifact and aggregated feedback (encompassing test and security results), then outputs a refined version , correcting errors or mitigating detected vulnerabilities.
A dual-loop refinement mechanism orchestrates the agent interactions:
- Inner Correctness Loop (via Forge): Utilizes the Foundry/Forge compiler and associated test harness to run comprehensive test suites, providing feedback such as test pass rates and specific assertion or stack trace failures ().
- Outer Security Loop (via Slither): Integrates the Slither static analyzer, which reports potential vulnerabilities with per-alert severity (Low/Medium/High), guiding security-related refactoring (e.g., enforcing checks-effects-interactions or adding access controls).
File-system tools (list_dir, read_file) empower the framework to reason across repository structures, load dependencies, and manage context, enabling multi-file smart contract synthesis beyond isolated code snippets.
2. Algorithmic Workflow and Stopping Strategies
The iterative workflow dynamically alternates between correctness and security refinement, employing both success and early-stopping mechanisms:
- Dynamic Stopping Conditions:
- Success: All tests pass (), and no unresolved high or medium severity vulnerabilities remain.
- Stagnation: No progress in either pass rate or vulnerability count for consecutive rounds (default ).
- Oscillation: Feedback similarity between consecutive rounds exceeds threshold (e.g., using sequence matching ratio), indicating the agent is trapped in a repetitive cycle.
- Feedback Aggregation: At each round, outputs from Forge (functionality correctness) and Slither (security inspection) are serialized and aggregated for downstream decision-making.
- Best Solution Selection: Track the best-performing code artifact and associated score (weighted by pass rate and negative vulnerability count) across all refinement rounds.
- Outline Pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Pseudocode summary Input: R (requirements), T (tests), MaxRounds, N (stagnation), τ (oscillation) Output: C_best C0 = CodingAgent.generate(R) C_best = C0; Score_best = -inf for t in 1..MaxRounds: pass_rate, failures = RunForge(C_{t-1}, T) vulnerabilities = RunSlither(C_{t-1}) Ft = aggregate(pass_rate, failures, vulnerabilities) if (pass_rate == 1.0 and no high/med vulns) or stagnation or oscillation: break C_t = RefiningAgent.refine(C_{t-1}, Ft) score_t = weight(pass_rate, -|vulns|) if score_t > Score_best: C_best = C_t; Score_best = score_t return C_best |
3. Metrics and Experimental Results
SolAgent is evaluated using the SolEval+ benchmark, comprising 81 file-level Solidity tasks and 1,188 hand-written Forge tests assessing key correctness and security properties.
- Evaluation Metrics:
- Pass@k:
where is the number of samples, the passing samples, and denotes the binomial coefficient. - Compile Rate:
- Vulnerability Reduction:
Key experimental outcomes (Pass@1, Compile Rate):
| Source / Model | CompileRate | Pass@1 |
|---|---|---|
| Human Baseline Repo | 100.00% | 100.00% |
| Vanilla LLM (Claude) | 39.51% | 25.59% |
| GitHub Copilot | 32.10% | 10.02% |
| DeepCode | 37.04% | 13.55% |
| MetaGPT | 35.80% | 11.78% |
| Qwen-Agent | 45.68% | 28.37% |
| SolAgent (Claude) | 95.06% | 64.39% |
SolAgent achieves a 127.1% relative Pass@1 improvement over the best vanilla LLM (Qwen-Agent: 28.37%). In total vulnerability reduction, SolAgent achieves 15.7% fewer static alerts compared to the human baseline (on 77 shared files: 293 vs. 247 alerts). Using a GPT-5-Mini base, a maximum reduction of 39.77% is observed (259 to 156 alerts).
Statistical variance for Pass@1 per correctly compiled file is reported (e.g. SolAgent(Claude): ).
4. Knowledge Distillation Procedure
SolAgent's high-quality trajectories are used as distillation data to train smaller open-source models employing instruction-following and demonstration learning paradigms:
Trajectory Collection:
- Full-Context: Original requirements plus detailed comments, agent dialogue, and final artifact.
- Compressed-Context: Summarized requirements and new agent-generated dialogues.
- Supervised Training Objective:
- Model Variants:
Distillation results on the held-out test set:
| Model | CompileRate | Pass@1 |
|---|---|---|
| Qwen3-8B (base) | 5.88% | 0.33% |
| Qwen3-32B | 35.29% | 1.31% |
| tracker-v2 | 17.65% | 1.31% |
Tracker-v2 quadruples Pass@1 over the base model and matches 32B performance, demonstrating the effectiveness of agent-generated distillation data.
5. Ablation Studies and Key Insights
Ablation analyses highlight the critical contributions of SolAgent’s core components:
- Removal of Inner Loop (Forge): Pass@1 drops from 64.39% to approximately 26% (Claude).
- Removal of Outer Loop (Slither): Increases vulnerabilities by 25–35% (worst-case min-vuln round).
- Exclusion of File-System Capabilities: Pass@1 reduces by approximately 7–22%, underscoring the importance of project context and cross-file reasoning.
This suggests that both functional and security feedback, as well as file-level context, are essential to high-quality smart contract generation in agent-based frameworks.
6. Limitations and Future Extensions
SolAgent currently targets single-contract files. Ongoing and future directions include extension to cross-contract systems, deeper integration of formal verification via SMT solvers or proof assistants (e.g., Coq, LiquidHaskell), and transposition of the tool-augmented multi-agent paradigm to other safety-critical domains, such as automotive and aerospace software.
7. System Schematic and Workflow Overview
The SolAgent pipeline can be represented as:
1 2 3 4 5 6 7 8 |
R → Coding Agent → C₀
└────────► Refining Loop ◄────────┘
[Forge → test feedback]
[Slither → security feedback]
[FileSystem → context]
dynamic stopping & best-code tracking
↓
C_best |
This workflow underpins SolAgent’s approach to robust, secure, and scalable smart contract generation (Chen et al., 30 Jan 2026). The open-source release is available at https://github.com/openpaperz/SolAgent.