EVM-Yul Interpreter: Semantics & Analysis
- EVM-Yul interpreter is a software engine that converts the Yul intermediate language into EVM bytecode, accurately modeling both high-level constructs and low-level execution details.
- It integrates conventional EVM execution with game semantics to explore contract behaviors and capture vulnerabilities such as reentrancy within defined bounds.
- Its layered design and bounded completeness ensure formal state transitions, efficient optimizations, and robust integration with Solidity toolchains for smart contract verification.
An EVM-Yul interpreter is a software engine that executes the Yul intermediate language by translating it into Ethereum Virtual Machine (EVM) bytecode semantics, faithfully modeling both the structured source constructs of Yul and the low-level execution model of the EVM. Modern EVM-Yul interpreters, such as those integrated into tools like YulToolkit, couple conventional EVM bytecode execution with higher-level operational semantics and game-semantic models to support precise, bounded-complete smart contract analysis. This dual-layer approach enables the exhaustive exploration of contract-environment interactions up to user-defined limits, capturing subtle vulnerabilities (e.g., reentrancy) while providing soundness guarantees within the explored bounds (Koutavas et al., 27 Dec 2025, Koutavas et al., 2024).
1. Formal State and Transition Systems
The state of an EVM-Yul interpreter consists of a tuple
$\Cfg = (\Stk, \AddrEnv, \ODom),$
where $\Stk$ is a stack of frames representing active execution contexts, $\AddrEnv = (\Addr_P, \Addr_O)$ partitions the universe of addresses into proponent (analyzed contract) and opponent (environment) domains, and $\ODom$ maintains the Opponent's dynamic knowledge sets, such as known values and addresses. Each frame encapsulates a sub-configuration of the EVM, defined as
$E = (\pc,\,\stack,\,\mem,\,\stor,\,\gas,\,\bal,\,\code),$
incorporating program counter, value stack, memory, persistent storage, gas, balances, and executable code.
State transitions are formalized in a labeled transition system,
where is an action corresponding to either an internal step (), or a structured “game move,” such as cross-contract calls or returns. Admissible execution traces are sequences of such labels conforming to bounds and protocol constraints (Koutavas et al., 27 Dec 2025).
2. Operational Semantics: EVM and Yul Levels
Execution proceeds in small steps, tracking EVM semantics at the opcode level and Yul at the construct level. For EVM, the semantics follow standard inference rules; for instance:
- PUSH Instruction:
$\inferrule { E.\pc = i \quad \code[i] = \kw{PUSH}\;n \quad \stack' = n : E.\stack \ \gas' = E.\gas - \cost(\kw{PUSH})} {(\pc=i, \stack, \dots, \gas) \bigstep{\kw{PUSH}\;n} (\pc=i+1, \stack', \dots, \gas')}$
- SSTORE Instruction: modifies contract storage.
- CALL Instruction: triggers context switching by pushing new frames onto $\Stk$ when inter-contract calls are encountered, with control-flow transitions determined by $\AddrEnv$ partition.
Yul-level semantics are defined both in small-step and big-step variants, with the interpreter typically linearizing Yul constructs into instruction vectors and maintaining explicit environment and store mappings. Key features include CEK-style reduction, local variable environments , and a global store encompassing EVM memory, storage, and gas. The interpreter maintains strict adherence to well-typedness and handles control constructs (loops, blocks, returns) per the formal operational rules (Koutavas et al., 2024).
3. Game-Semantic Interaction and Environment Model
A distinguishing aspect of advanced EVM-Yul interpreters is their explicit game-semantics protocol. Computation is modeled as an interactive game between the Proponent (the contract of interest) and the Opponent (arbitrary environment). Moves include:
- $\kw{deploy}$: Initialization and deployment of contract state.
- $\kw{o\mbox{-}call}(a \to b, f, \vec v)$: Opponent-controlled calls to a contract.
- $\kw{pp\mbox{-}call}, \kw{po\mbox{-}call}$: Proponent-to-Proponent and Proponent-to-Opponent call transitions.
- Return and time-advance “wait” moves.
This protocol enables exact reasoning about the contract’s behavior in the presence of arbitrary, possibly adversarial, environments, while ensuring that only feasible, well-formed traces are explored. By explicitly bounding the number of such moves and structuring the entire state space, the semantics avoid over-approximation and facilitate the discovery of concrete exploit traces (Koutavas et al., 27 Dec 2025).
4. Bounded Completeness and Parameterization
The exploration algorithm operates within a tuple of user-specified bounds,
where these parameters cap:
- Maximum Opponent calls per function (),
- Stack depth (),
- Initial gas per call/execution (),
- Total simulated time (),
- Time granularity per $\kw{o\mbox{-}wait}$ move ().
Every explored trace $\tr$ is checked so that the number of Opponent calls, stack depth, and simulated time do not exceed these limits. Gas metering at strict EVM opcode granularity ensures that no loop or recursion can violate bounds, providing bounded completeness: the absence of an attack trace within implies no exploits exist within those parameters (Koutavas et al., 27 Dec 2025).
5. Interpreter Architecture and Layered Implementation
Modular interpreter organization is realized through strongly separated layers:
- Parser Module: Constructs an abstract syntax tree (AST) from Yul source using
menhir/ocamllex; ABI signatures are parsed withyojson. - Game Semantics Module: Defines the main transition system and maintains search frontiers, orchestrating exploration by breadth- or depth-first strategies under bounds .
- Yul Semantics Module: Translates structured Yul programs into EVM sub-instructions via big-step reductions, returning control to the global scheduler on “stuck” steps (e.g., external calls).
- EVM Dialect Module: Implements the small-step interpreter for EVM, including arithmetic, storage, memory expansion, gas metering, and analysis hooks.
Integration with the standard Solidity toolchain is achieved by injecting custom opcodes post-Yul compilation and providing a CLI for bounds configuration and ABI pruning. A dialect-parametric design allows reuse for symbolic execution, verification, and integration into analysis frameworks; a distinct advantage exploited in tools such as YulTracer, which supports both concrete and SMT-backed symbolic execution models (Koutavas et al., 27 Dec 2025, Koutavas et al., 2024).
6. Concrete Execution Example and Trace
A minimal Yul contract object, such as: is linearized and executed stepwise as follows: deploy the object at address , the Opponent issues an $\kw{o\mbox{-}call}$ to with , corresponding to a concrete EVM stack update and storage store, culminating in a $\kw{pp\mbox{-}ret}$ and $\kw{o\mbox{-}ret}$ as control returns. Each step updates the machine state $(\pc, \stack, \stor, \gas)$, with annotated trace labels denoting boundary-crossing moves. This exposes precisely how contract storage and return values are manipulated in the presence of adversarial inputs and demonstrates the feasibility of trace-based vulnerability detection in bounded-complete settings (Koutavas et al., 27 Dec 2025, Koutavas et al., 2024).
7. Optimizations, Correctness, and Ecosystem Integration
EVM-Yul interpreters benefit from provably sound optimizations. Block-scoping and break-handler context reductions are implemented based on the “redundant frames” lemma: nested block contexts or break-handler contexts with identical variable domains can be collapsed or bypassed, yielding more efficient execution without changing evaluation outcomes. The optimization is mechanically justified by small-step reduction induction and preserves correctness (Koutavas et al., 2024).
The interpreters have been validated on both official Solidity Yul tests and bespoke test suites, achieving full correctness on all supported language features and passing performance metrics consistent with tractable symbolic analysis and larger contract executions. Because of their parameterized design, such interpreters are readily portable into EVM toolchains, SMT-based symbolic engines, and even static analysis frameworks, substantially extending their utility in Ethereum contract safety and formal analysis (Koutavas et al., 27 Dec 2025, Koutavas et al., 2024).