Papers
Topics
Authors
Recent
2000 character limit reached

YulToolkit: Bounded Game-Semantics Checker

Updated 3 January 2026
  • YulToolkit is a bounded game-semantics model checker for Yul that models smart contract execution as a two-player game between the contract and its environment.
  • It employs a precise EVM-Yul interpreter with custom instrumentation, programmatic trace exploration, and resource bounds to ensure exhaustive detection of vulnerabilities.
  • The tool demonstrates practical scalability on large DeFi contracts by leveraging bespoke bounds, instrumentation, and meticulous adherence to EVM semantics to guarantee no false positives within limits.

YulToolkit is a bounded game-semantics model checker for Yul, the intermediate language employed by Solidity, enabling precise, bounded-complete exploration of smart contract behaviour by exhaustively enumerating all feasible traces within user-supplied resource limits. It targets vulnerabilities—principally those arising from contract-environment interaction, such as reentrancy—by modelling computation as a two-player game between the contract (Proponent) and its environment (Opponent). YulToolkit achieves practical tractability on large DeFi contracts via programmatic instrumentation, custom game bounds, and meticulous adherence to EVM-Yul semantics, guaranteeing “no false positives” within bounded parameters (Koutavas et al., 27 Dec 2025).

1. Game-Semantics Foundation

YulToolkit models a compiled Yul contract as a two-player game:

  • Proponent: the contract code under analysis.
  • Opponent: the environment, possibly adversarial, controlling external addresses and function inputs.

A configuration is

C=(stk,Addr,Odom)C = ( \mathit{stk}, \mathit{Addr}, \mathit{Odom} )

where:

  • stk\mathit{stk}, a stack of activation frames. Each frame is either a Proponent frame P,E\langle P, E \rangle (Yul object PP with EVM state EE) or an Opponent frame E\langle E \rangle (EVM state only).
  • Addr=(AP,AO)\mathit{Addr} = (A_P, A_O), the partition of addresses controlled by Proponent and Opponent.
  • Odom\mathit{Odom}, Opponent knowledge: type-indexed “known-value” sets (e.g., uint256, address).

Moves are transitions:

CmCC \xrightarrow{m} C'

Crucial move types include:

  • deploy: constructor completion to Opponent turn
  • o-call: Opponent invokes a Proponent address
  • pp-call, po-call: cross-domain calls from Proponent to Opponent or Proponent objects
  • internal: normal Yul/EVM reduction in Proponent frame
  • o-wait: Opponent advances block time

Inference-style small-step rules specify when and how these moves occur, e.g., Opponent can call any Proponent address with possible known argument values:

stack=[E], addrAP([E],AP,AO,Odom)ocall(addr,f,v)([Paddr,Ecall(v)],AP,AO,Odom)\frac{\mathit{stack}=[\langle E\rangle],\ \mathit{addr}\in A_P} { ([\langle E\rangle],A_P,A_O,\mathit{Odom}) \xrightarrow{\mathit{o-call}(addr,f,\vec v)} ([\langle P_{addr}, E_{call}(\vec v)\rangle],A_P,A_O,\mathit{Odom}') }

Safety is enforced via an explicit ASSERT opcode. Any trace reaching an assertion failure is reported as a contract violation; implicitly, contracts are asserted never to send more ETH than held in their balance.

2. Bounded Completeness and Exploration

YulToolkit’s analysis is parameterized by a bound vector B=(kcall,kstack,G,W)B = (k_{\mathit{call}}, k_{\mathit{stack}}, G, W):

  • kcallk_{\mathit{call}}: max Opponent calls per Proponent function
  • kstackk_{\mathit{stack}}: max Opponent-Proponent call stack depth
  • GG: gas bound per deployment/call
  • WW: max total Opponent “wait” (time advances)

A configuration is considered “within bounds” if no move sequence exceeds these parameters.

Bounded Completeness Theorem: If all bounded game traces TrB(P)\mathit{Tr}_B(P) for contract PP avoid assertion failures, then no attack contract or transaction sequence satisfying BB can violate the same assertions (Koutavas et al., 27 Dec 2025). The YulToolkit interpreter is trace-exact—every move explored is EVM-feasible; thus, exhaustive game exploration precludes within-bound attacks.

3. EVM–Yul Interpreter Architecture and Precision

The interpreter comprises four layers:

Layer Function Notes
Yul+ABI Parser Parses Yul + ABI to AST and utility tables
Game Semantics Driver Nondeterministically enumerates moves, enforces BB Trace generator, move explorer
Yul Semantics Evaluator Implements structured (AST-level) semantics Hooks for custom opcodes
EVM Dialect Models Cancun-era EVM opcodes, uint256 via Zarith Storage, memory, gas, Keccak256, precise models

Key properties include:

  • Gas: Accurately modeled under Yul structured flow (may under-approximate bytecode cost; never underestimates exhaustion).
  • Hashing: Keccak256 as a collision-free bijection, enforced by memoization.
  • Call-stack, balances, storage, memory: Precise to EVM semantics.
  • CALL, DELEGATECALL, CREATE, CREATE2, and custom opcodes (including ASSERT) mapped to distinct move kinds and trace semantics.

Operations like REVEAL_UINT, REVEAL_ADDR extend Opponent knowledge domains, facilitating controlled exploration of nondeterminism.

4. Instrumentation and Domain Knowledge Propagation

YulToolkit supports instrumentation to express domain knowledge and control trace exploration:

  • Solidity-level hooks: User-defined functions like
    1
    2
    
    function __yult__assert(bool cond) pure internal { }
    function __yult__printHex(bytes32) pure internal { }
    These are recognized at preprocessing.
  • Yul-level injection: Hook calls are replaced by custom opcodes:
    • ASSERT for __yult__assert
    • PRINT, PRINT_hex
    • REVEAL_UINT, REVEAL_ADDR for domain enrichment

Domain knowledge propagation is exemplified by hard-coding specific hash values as known to the Opponent. For instance, using PRINT_hex K after keccak256 preprocessing allows the analysis to seed Odom\mathit{Odom} accordingly, essential for benchmarks like ModifierReentrancy.

Developers can use custom Deployer contracts to set up bespoke on-chain states, and instrumented asserts to control vulnerability reporting.

5. Trace Enumeration Algorithm and Practicality

YulToolkit implements depth-first backtracking over all enabled move sequences, truncated by the bound vector BB. At each step:

  1. Compute enabled moves Moves(C)\mathit{Moves}(C) from configuration CC.
  2. For each move mMoves(C)m\in\mathit{Moves}(C), transition to CC' and recurse.
  3. Abort a branch upon exceeding any resource bound or revisiting an EVM configuration within the same call stack (repetition check).

The worst-case number of branches grows exponentially with:

  • kcallk_{\mathit{call}} (call bound)
  • AO|A_O| (number of Opponent addresses)
  • Odom|\mathit{Odom}| (domain size for nondeterministic values)
  • Number of ABI functions exposed

Tractability mechanisms include:

  • Restricting Opponent calls to ABI subsets—critical for large contracts with extensive interfaces
  • Seeding and pruning Opponent domains to only actual/explored values
  • Using orchestration contracts that aggregate multiple low-level moves
  • Disabling view/pure functions for minimization of trace space
  • Coarse-grained Opponent wait moves (one or two per trace, if applicable)

These strategies enable exploration of contracts comprising tens of thousands of lines within practical time.

6. Empirical Evaluation and Comparative Analysis

Experiments were performed on an Intel i7, 32GB RAM, Ubuntu 24.04.2 system, targeting three major DeFi incidents:

Contract Yul LoC Functions Bounds Time to Discovery
DAO 12,372 21 kcall=2k_{\mathit{call}}=2 Exploit: 0.5s, secondary: 10s
Lendf.Me 12,615 29 kstack=3k_{\mathit{stack}}=3 Attack: 4s
PredyPool 87,913 90 G=30,000G=30,000 ETH Exploit (using harness): 15s

Further, the tool demonstrated 100% recall (23/23) on the Gigahorse reentrancy benchmarks, with average detection time well below one second. Only minimal additional seeding (e.g., missing deposit functions or a printed hash) was needed. By contrast, Manticore detected 8/23, Mythril ~19/23, and Symvalic 21/23 cases in the same benchmarks.

Comparison with Gigahorse highlighted ABI size as a critical factor: the “spank_chain_payment” benchmark required 22 minutes with the full ABI but only 0.3s after pruning to a minimal ABI.

7. Scalability, Limitations, and Future Directions

YulToolkit demonstrates scalability to large, real-world DeFi codebases—requiring only modest developer effort for harness construction and ABI selection, and leveraging aggressive pruning and domain control. The bounded completeness theorem ensures the absence of false positives up to interpreter precision and bounding vector.

Identified limitations:

  • Completeness bound by BB; unbounded/infinite traces, e.g., unlimited gas/time, are not addressed.
  • EIP-1153 transient storage is unsupported.
  • Gas accounting under-approximates real bytecode costs (Yul’s structured flow yields fewer opcodes).
  • Instrumentation (harnessing, state setup, domain seeding) requires developer expertise, at a level comparable to writing targeted test harnesses.

Planned directions include:

  • EVM upgrades and transient storage support.
  • Integrating symbolic game semantics to enable higher bounds more efficiently.
  • Automatic harness generation from metadata.
  • Extension beyond reentrancy to other bug classes: access-control, arithmetic overflows, ERC-777 hooks, etc.

YulToolkit brings a precise, bounded, and semantics-driven approach that complements prior symbolic and fuzzing-based tools, offering strong guarantees on the feasibility and exhaustiveness of detected contract vulnerabilities within user-specified resource models (Koutavas et al., 27 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to YulToolkit.