YulToolkit: Bounded Game-Semantics Checker
- YulToolkit is a bounded game-semantics model checker for Yul that models smart contract execution as a two-player game between the contract and its environment.
- It employs a precise EVM-Yul interpreter with custom instrumentation, programmatic trace exploration, and resource bounds to ensure exhaustive detection of vulnerabilities.
- The tool demonstrates practical scalability on large DeFi contracts by leveraging bespoke bounds, instrumentation, and meticulous adherence to EVM semantics to guarantee no false positives within limits.
YulToolkit is a bounded game-semantics model checker for Yul, the intermediate language employed by Solidity, enabling precise, bounded-complete exploration of smart contract behaviour by exhaustively enumerating all feasible traces within user-supplied resource limits. It targets vulnerabilities—principally those arising from contract-environment interaction, such as reentrancy—by modelling computation as a two-player game between the contract (Proponent) and its environment (Opponent). YulToolkit achieves practical tractability on large DeFi contracts via programmatic instrumentation, custom game bounds, and meticulous adherence to EVM-Yul semantics, guaranteeing “no false positives” within bounded parameters (Koutavas et al., 27 Dec 2025).
1. Game-Semantics Foundation
YulToolkit models a compiled Yul contract as a two-player game:
- Proponent: the contract code under analysis.
- Opponent: the environment, possibly adversarial, controlling external addresses and function inputs.
A configuration is
where:
- , a stack of activation frames. Each frame is either a Proponent frame (Yul object with EVM state ) or an Opponent frame (EVM state only).
- , the partition of addresses controlled by Proponent and Opponent.
- , Opponent knowledge: type-indexed “known-value” sets (e.g., uint256, address).
Moves are transitions:
Crucial move types include:
deploy: constructor completion to Opponent turno-call: Opponent invokes a Proponent addresspp-call,po-call: cross-domain calls from Proponent to Opponent or Proponent objectsinternal: normal Yul/EVM reduction in Proponent frameo-wait: Opponent advances block time
Inference-style small-step rules specify when and how these moves occur, e.g., Opponent can call any Proponent address with possible known argument values:
Safety is enforced via an explicit ASSERT opcode. Any trace reaching an assertion failure is reported as a contract violation; implicitly, contracts are asserted never to send more ETH than held in their balance.
2. Bounded Completeness and Exploration
YulToolkit’s analysis is parameterized by a bound vector :
- : max Opponent calls per Proponent function
- : max Opponent-Proponent call stack depth
- : gas bound per deployment/call
- : max total Opponent “wait” (time advances)
A configuration is considered “within bounds” if no move sequence exceeds these parameters.
Bounded Completeness Theorem: If all bounded game traces for contract avoid assertion failures, then no attack contract or transaction sequence satisfying can violate the same assertions (Koutavas et al., 27 Dec 2025). The YulToolkit interpreter is trace-exact—every move explored is EVM-feasible; thus, exhaustive game exploration precludes within-bound attacks.
3. EVM–Yul Interpreter Architecture and Precision
The interpreter comprises four layers:
| Layer | Function | Notes |
|---|---|---|
| Yul+ABI Parser | Parses Yul + ABI to AST and utility tables | |
| Game Semantics Driver | Nondeterministically enumerates moves, enforces | Trace generator, move explorer |
| Yul Semantics Evaluator | Implements structured (AST-level) semantics | Hooks for custom opcodes |
| EVM Dialect | Models Cancun-era EVM opcodes, uint256 via Zarith | Storage, memory, gas, Keccak256, precise models |
Key properties include:
- Gas: Accurately modeled under Yul structured flow (may under-approximate bytecode cost; never underestimates exhaustion).
- Hashing: Keccak256 as a collision-free bijection, enforced by memoization.
- Call-stack, balances, storage, memory: Precise to EVM semantics.
- CALL, DELEGATECALL, CREATE, CREATE2, and custom opcodes (including ASSERT) mapped to distinct move kinds and trace semantics.
Operations like REVEAL_UINT, REVEAL_ADDR extend Opponent knowledge domains, facilitating controlled exploration of nondeterminism.
4. Instrumentation and Domain Knowledge Propagation
YulToolkit supports instrumentation to express domain knowledge and control trace exploration:
- Solidity-level hooks: User-defined functions like
These are recognized at preprocessing.1 2
function __yult__assert(bool cond) pure internal { } function __yult__printHex(bytes32) pure internal { } - Yul-level injection: Hook calls are replaced by custom opcodes:
ASSERTfor__yult__assertPRINT,PRINT_hexREVEAL_UINT,REVEAL_ADDRfor domain enrichment
Domain knowledge propagation is exemplified by hard-coding specific hash values as known to the Opponent. For instance, using PRINT_hex K after keccak256 preprocessing allows the analysis to seed accordingly, essential for benchmarks like ModifierReentrancy.
Developers can use custom Deployer contracts to set up bespoke on-chain states, and instrumented asserts to control vulnerability reporting.
5. Trace Enumeration Algorithm and Practicality
YulToolkit implements depth-first backtracking over all enabled move sequences, truncated by the bound vector . At each step:
- Compute enabled moves from configuration .
- For each move , transition to and recurse.
- Abort a branch upon exceeding any resource bound or revisiting an EVM configuration within the same call stack (repetition check).
The worst-case number of branches grows exponentially with:
- (call bound)
- (number of Opponent addresses)
- (domain size for nondeterministic values)
- Number of ABI functions exposed
Tractability mechanisms include:
- Restricting Opponent calls to ABI subsets—critical for large contracts with extensive interfaces
- Seeding and pruning Opponent domains to only actual/explored values
- Using orchestration contracts that aggregate multiple low-level moves
- Disabling view/pure functions for minimization of trace space
- Coarse-grained Opponent wait moves (one or two per trace, if applicable)
These strategies enable exploration of contracts comprising tens of thousands of lines within practical time.
6. Empirical Evaluation and Comparative Analysis
Experiments were performed on an Intel i7, 32GB RAM, Ubuntu 24.04.2 system, targeting three major DeFi incidents:
| Contract | Yul LoC | Functions | Bounds | Time to Discovery |
|---|---|---|---|---|
| DAO | 12,372 | 21 | Exploit: 0.5s, secondary: 10s | |
| Lendf.Me | 12,615 | 29 | Attack: 4s | |
| PredyPool | 87,913 | 90 | ETH | Exploit (using harness): 15s |
Further, the tool demonstrated 100% recall (23/23) on the Gigahorse reentrancy benchmarks, with average detection time well below one second. Only minimal additional seeding (e.g., missing deposit functions or a printed hash) was needed. By contrast, Manticore detected 8/23, Mythril ~19/23, and Symvalic 21/23 cases in the same benchmarks.
Comparison with Gigahorse highlighted ABI size as a critical factor: the “spank_chain_payment” benchmark required 22 minutes with the full ABI but only 0.3s after pruning to a minimal ABI.
7. Scalability, Limitations, and Future Directions
YulToolkit demonstrates scalability to large, real-world DeFi codebases—requiring only modest developer effort for harness construction and ABI selection, and leveraging aggressive pruning and domain control. The bounded completeness theorem ensures the absence of false positives up to interpreter precision and bounding vector.
Identified limitations:
- Completeness bound by ; unbounded/infinite traces, e.g., unlimited gas/time, are not addressed.
- EIP-1153 transient storage is unsupported.
- Gas accounting under-approximates real bytecode costs (Yul’s structured flow yields fewer opcodes).
- Instrumentation (harnessing, state setup, domain seeding) requires developer expertise, at a level comparable to writing targeted test harnesses.
Planned directions include:
- EVM upgrades and transient storage support.
- Integrating symbolic game semantics to enable higher bounds more efficiently.
- Automatic harness generation from metadata.
- Extension beyond reentrancy to other bug classes: access-control, arithmetic overflows, ERC-777 hooks, etc.
YulToolkit brings a precise, bounded, and semantics-driven approach that complements prior symbolic and fuzzing-based tools, offering strong guarantees on the feasibility and exhaustiveness of detected contract vulnerabilities within user-specified resource models (Koutavas et al., 27 Dec 2025).