YulToolkit: Bounded Game-Semantics Checker

Updated 3 January 2026

YulToolkit is a bounded game-semantics model checker for Yul that models smart contract execution as a two-player game between the contract and its environment.
It employs a precise EVM-Yul interpreter with custom instrumentation, programmatic trace exploration, and resource bounds to ensure exhaustive detection of vulnerabilities.
The tool demonstrates practical scalability on large DeFi contracts by leveraging bespoke bounds, instrumentation, and meticulous adherence to EVM semantics to guarantee no false positives within limits.

YulToolkit is a bounded game-semantics model checker for Yul, the intermediate language employed by Solidity, enabling precise, bounded-complete exploration of smart contract behaviour by exhaustively enumerating all feasible traces within user-supplied resource limits. It targets vulnerabilities—principally those arising from contract-environment interaction, such as reentrancy—by modelling computation as a two-player game between the contract (Proponent) and its environment (Opponent). YulToolkit achieves practical tractability on large DeFi contracts via programmatic instrumentation, custom game bounds, and meticulous adherence to EVM-Yul semantics, guaranteeing “no false positives” within bounded parameters (Koutavas et al., 27 Dec 2025).

1. Game-Semantics Foundation

YulToolkit models a compiled Yul contract as a two-player game:

Proponent: the contract code under analysis.
Opponent: the environment, possibly adversarial, controlling external addresses and function inputs.

A configuration is

$C = ( \mathit{stk}, \mathit{Addr}, \mathit{Odom} )$

where:

$\mathit{stk}$ , a stack of activation frames. Each frame is either a Proponent frame $\langle P, E \rangle$ (Yul object $P$ with EVM state $E$ ) or an Opponent frame $\langle E \rangle$ (EVM state only).
$\mathit{Addr} = (A_P, A_O)$ , the partition of addresses controlled by Proponent and Opponent.
$\mathit{Odom}$ , Opponent knowledge: type-indexed “known-value” sets (e.g., uint256, address).

Moves are transitions:

$C \xrightarrow{m} C'$

Crucial move types include:

deploy: constructor completion to Opponent turn
o-call: Opponent invokes a Proponent address
pp-call, po-call: cross-domain calls from Proponent to Opponent or Proponent objects
internal: normal Yul/EVM reduction in Proponent frame
o-wait: Opponent advances block time

Inference-style small-step rules specify when and how these moves occur, e.g., Opponent can call any Proponent address with possible known argument values:

$\frac{\mathit{stack}=[\langle E\rangle],\ \mathit{addr}\in A_P} { ([\langle E\rangle],A_P,A_O,\mathit{Odom}) \xrightarrow{\mathit{o-call}(addr,f,\vec v)} ([\langle P_{addr}, E_{call}(\vec v)\rangle],A_P,A_O,\mathit{Odom}') }$

Safety is enforced via an explicit ASSERT opcode. Any trace reaching an assertion failure is reported as a contract violation; implicitly, contracts are asserted never to send more ETH than held in their balance.

2. Bounded Completeness and Exploration

YulToolkit’s analysis is parameterized by a bound vector $B = (k_{\mathit{call}}, k_{\mathit{stack}}, G, W)$ :

$k_{\mathit{call}}$ : max Opponent calls per Proponent function
$k_{\mathit{stack}}$ : max Opponent-Proponent call stack depth
$G$ : gas bound per deployment/call
$W$ : max total Opponent “wait” (time advances)

A configuration is considered “within bounds” if no move sequence exceeds these parameters.

Bounded Completeness Theorem: If all bounded game traces $\mathit{Tr}_B(P)$ for contract $P$ avoid assertion failures, then no attack contract or transaction sequence satisfying $B$ can violate the same assertions (Koutavas et al., 27 Dec 2025). The YulToolkit interpreter is trace-exact—every move explored is EVM-feasible; thus, exhaustive game exploration precludes within-bound attacks.

3. EVM–Yul Interpreter Architecture and Precision

The interpreter comprises four layers:

Layer	Function	Notes
Yul+ABI Parser	Parses Yul + ABI to AST and utility tables
Game Semantics Driver	Nondeterministically enumerates moves, enforces $B$	Trace generator, move explorer
Yul Semantics Evaluator	Implements structured (AST-level) semantics	Hooks for custom opcodes
EVM Dialect	Models Cancun-era EVM opcodes, uint256 via Zarith	Storage, memory, gas, Keccak256, precise models

Key properties include:

Gas: Accurately modeled under Yul structured flow (may under-approximate bytecode cost; never underestimates exhaustion).
Hashing: Keccak256 as a collision-free bijection, enforced by memoization.
Call-stack, balances, storage, memory: Precise to EVM semantics.
CALL, DELEGATECALL, CREATE, CREATE2, and custom opcodes (including ASSERT) mapped to distinct move kinds and trace semantics.

Operations like REVEAL_UINT, REVEAL_ADDR extend Opponent knowledge domains, facilitating controlled exploration of nondeterminism.

4. Instrumentation and Domain Knowledge Propagation

YulToolkit supports instrumentation to express domain knowledge and control trace exploration:

Solidity-level hooks: User-defined functions like

1 2	function __yult__assert(bool cond) pure internal { } function __yult__printHex(bytes32) pure internal { }

These are recognized at preprocessing.

Yul-level injection: Hook calls are replaced by custom opcodes:
- ASSERT for __yult__assert
- PRINT, PRINT_hex
- REVEAL_UINT, REVEAL_ADDR for domain enrichment

Domain knowledge propagation is exemplified by hard-coding specific hash values as known to the Opponent. For instance, using PRINT_hex K after keccak256 preprocessing allows the analysis to seed $\mathit{Odom}$ accordingly, essential for benchmarks like ModifierReentrancy.

Developers can use custom Deployer contracts to set up bespoke on-chain states, and instrumented asserts to control vulnerability reporting.

5. Trace Enumeration Algorithm and Practicality

YulToolkit implements depth-first backtracking over all enabled move sequences, truncated by the bound vector $B$ . At each step:

Compute enabled moves $\mathit{Moves}(C)$ from configuration $C$ .
For each move $m\in\mathit{Moves}(C)$ , transition to $C'$ and recurse.
Abort a branch upon exceeding any resource bound or revisiting an EVM configuration within the same call stack (repetition check).

The worst-case number of branches grows exponentially with:

$k_{\mathit{call}}$ (call bound)
$|A_O|$ (number of Opponent addresses)
$|\mathit{Odom}|$ (domain size for nondeterministic values)
Number of ABI functions exposed

Tractability mechanisms include:

Restricting Opponent calls to ABI subsets—critical for large contracts with extensive interfaces
Seeding and pruning Opponent domains to only actual/explored values
Using orchestration contracts that aggregate multiple low-level moves
Disabling view/pure functions for minimization of trace space
Coarse-grained Opponent wait moves (one or two per trace, if applicable)

These strategies enable exploration of contracts comprising tens of thousands of lines within practical time.

6. Empirical Evaluation and Comparative Analysis

Experiments were performed on an Intel i7, 32GB RAM, Ubuntu 24.04.2 system, targeting three major DeFi incidents:

Contract	Yul LoC	Functions	Bounds	Time to Discovery
DAO	12,372	21	$k_{\mathit{call}}=2$	Exploit: 0.5s, secondary: 10s
Lendf.Me	12,615	29	$k_{\mathit{stack}}=3$	Attack: 4s
PredyPool	87,913	90	$G=30,000$ ETH	Exploit (using harness): 15s

Further, the tool demonstrated 100% recall (23/23) on the Gigahorse reentrancy benchmarks, with average detection time well below one second. Only minimal additional seeding (e.g., missing deposit functions or a printed hash) was needed. By contrast, Manticore detected 8/23, Mythril ~19/23, and Symvalic 21/23 cases in the same benchmarks.

Comparison with Gigahorse highlighted ABI size as a critical factor: the “spank_chain_payment” benchmark required 22 minutes with the full ABI but only 0.3s after pruning to a minimal ABI.

7. Scalability, Limitations, and Future Directions

YulToolkit demonstrates scalability to large, real-world DeFi codebases—requiring only modest developer effort for harness construction and ABI selection, and leveraging aggressive pruning and domain control. The bounded completeness theorem ensures the absence of false positives up to interpreter precision and bounding vector.

Identified limitations:

Completeness bound by $B$ ; unbounded/infinite traces, e.g., unlimited gas/time, are not addressed.
EIP-1153 transient storage is unsupported.
Gas accounting under-approximates real bytecode costs (Yul’s structured flow yields fewer opcodes).
Instrumentation (harnessing, state setup, domain seeding) requires developer expertise, at a level comparable to writing targeted test harnesses.

Planned directions include:

EVM upgrades and transient storage support.
Integrating symbolic game semantics to enable higher bounds more efficiently.
Automatic harness generation from metadata.
Extension beyond reentrancy to other bug classes: access-control, arithmetic overflows, ERC-777 hooks, etc.

YulToolkit brings a precise, bounded, and semantics-driven approach that complements prior symbolic and fuzzing-based tools, offering strong guarantees on the feasibility and exhaustiveness of detected contract vulnerabilities within user-specified resource models (Koutavas et al., 27 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

A Bounded Game Semantics Checker for Precise Smart Contract Analysis (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to YulToolkit.

YulToolkit: Bounded Game-Semantics Checker

1. Game-Semantics Foundation

2. Bounded Completeness and Exploration

3. EVM–Yul Interpreter Architecture and Precision

4. Instrumentation and Domain Knowledge Propagation

5. Trace Enumeration Algorithm and Practicality

6. Empirical Evaluation and Comparative Analysis

7. Scalability, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

YulToolkit: Bounded Game-Semantics Checker

1. Game-Semantics Foundation

2. Bounded Completeness and Exploration

3. EVM–Yul Interpreter Architecture and Precision

4. Instrumentation and Domain Knowledge Propagation

5. Trace Enumeration Algorithm and Practicality

6. Empirical Evaluation and Comparative Analysis

7. Scalability, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research