Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
132 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Three-Address Code (TAC)

Updated 1 July 2025
  • Three-Address Code (TAC) is an intermediate representation that expresses computations using at most three operands, ensuring explicit data and control dependencies.
  • It enables clear control flow and data flow analysis, supporting formal verification through proof-producing transpilers and robust static program analysis.
  • TAC serves as a bridge in machine learning decompilation pipelines by transforming low-level binary code into semantically accurate, human-readable source code.

Three-Address Code (TAC) is a form of intermediate representation widely adopted in modern compilers and binary analysis platforms to bridge the semantic gap between low-level machine instructions and high-level source languages. By encoding computations as sequences of operations involving at most three operands, TAC facilitates program analysis, optimization, and translation across heterogeneous architectures and environments. Recently, TAC has played a central role both in rigorous, formal verification pipelines and in large-scale machine-learning–driven decompilation, as evidenced by its application in proof-producing transpilers from binary code and its use as an intermediate target when decompiling smart contract bytecode to readable, high-level code.

1. Structural Characteristics and Formal Properties of Three-Address Code

Three-Address Code consists of instruction sequences in which each statement typically has the form: target=operand1 operator operand2\text{target} = \text{operand}_1\ \text{operator}\ \text{operand}_2 This structure enforces explicitness in data and control dependencies. TAC instructions encompass arithmetic operations, assignments, conditional and unconditional jumps, and function calls. The explicit representation allows the construction of control flow graphs with labeled jump targets and explicit conditional branches, making control and data flow analysis tractable. The explicit temporality of assignments (often to temporary variables) removes ambiguity present in stack- or register-based code and supports efficient liveness analysis and optimization.

Mathematically, given a sequence {I1,I2,...,In}\{I_1, I_2, ..., I_n\} of TAC instructions, the semantics are typically formalized as a state transition system, where each instruction maps an input state to an output state deterministically. This property is leveraged in formal verification frameworks to define precise simulation relations between the states of low-level and intermediate representations.

2. TAC in Proof-Producing Binary-to-IR Transpilation

The rigorous translation of binary code to machine-independent intermediate languages often proceeds by first parsing binary instructions into their respective instruction set architecture representations, then “lifting” them into a TAC-like intermediate language. In proof-producing transpilers, such as those formally modeled in HOL4, every translation step from binary to TAC is instrumented to generate formal proof objects. The core correctness property is formalized as a simulation theorem: σ,σ.R(σ,σ)P(σ)σ1    σ1.P(σ)σ1R(σ1,σ1)\forall \sigma, \sigma'.\, \mathcal{R}(\sigma, \sigma') \land \llbracket P \rrbracket(\sigma) \rightarrow \sigma_1 \implies \exists \sigma_1'.\, \llbracket P' \rrbracket(\sigma') \rightarrow \sigma_1' \land \mathcal{R}(\sigma_1, \sigma_1') where PP is the binary program, PP' the TAC/IR translation, σ\sigma and σ\sigma' their respective states, and R\mathcal{R} a formally defined state relation. The simulation theorem ensures that each computation step in the binary semantics is soundly simulated (possibly in multiple steps) by the TAC program.

Proof-producing transpilers thus return not only the TAC representation but a machine-checkable certificate asserting semantic equivalence. This underpins the transfer of verified properties and high trustworthiness for downstream analyses and transformations.

3. Static Program Analysis for Bytecode-to-TAC Conversion

Static program analysis is integral to the conversion of stack-based or bytecode programs into TAC. The process generally includes:

  • Control Flow Recovery: Identifying function boundaries, basic blocks, and reconstructing the program’s control flow graph by tracking jump destinations.
  • Stack Deconstruction: Simulating stack operations to resolve transient values into explicit variables, remapping stack slots into TAC variables.
  • Type and Signature Inference: Analyzing data usage patterns (reads, writes, call targets) to infer candidate types, which are crucial for meaningful variable representations and subsequent translation.
  • Data Flow Analysis: Making explicit all value dependencies and their lifetimes, so that assignments, uses, and modifications are accurately mirrored in TAC form.

This intensive analysis phase yields a normalized, stack-free, and semantically faithful TAC sequence, on which both rule-based and neural program translation can operate reliably.

4. TAC as a Bridge to High-Level Source Code via Machine Learning

In neural decompilation pipelines, TAC serves as a robust intermediate layer between opaque bytecode and human-readable source code. For instance, when decompiling smart contract bytecode (such as Ethereum Virtual Machine code) to Solidity, TAC is generated via static analysis as described above. Subsequently, LLMs—such as a Llama-3.2-3B variant fine-tuned on over 238,000 TAC-to-Solidity pairs—take TAC as input and emit semantically faithful, idiomatic source code.

The fine-tuned LLM leverages extensive supervision to recover:

  • High-level control flow and structured constructs from explicit TAC jumps and branches,
  • Meaningful variable names and types based on context and usage,
  • Intricate patterns and conventions specific to the target language (e.g., Solidity standards),
  • Detailed function signatures.

The result, grounded by the TAC structure, is source code with an average semantic similarity of 0.82 compared to the original verified contracts, benefiting manual auditability and enabling downstream static and semantic analysis (Decompiling Smart Contracts with a Large Language Model, 24 Jun 2025).

5. Empirical Evaluation and Readability of TAC-Based Reconstructions

The readability and faithfulness of code generated from TAC have been empirically evaluated using multiple metrics:

  • Semantic Similarity: Outputs based on TAC representation achieve an average similarity of 0.82 to original source, significantly surpassing traditional decompilers that operate directly on bytecode and often only achieve 40–50%.
  • Code Structure: Reconstructed code preserves original structuring keywords (such as 'function', 'require') and idiomatic patterns within a 2% deviation of the gold standard, indicating high structural and syntactic fidelity.
  • Variable Recovery: The approach frequently recovers meaningful variable names, occasionally improving clarity over the original, and robustly preserves key assertions and error checks.

These properties make TAC-based reconstructions suitable for high-stakes applications in security auditing, incident response, and contract verification.

6. TAC in Analysis, Verification, and Practical Tooling

TAC’s explicitness enables its use in a variety of practical tools and methodologies:

Step Methodology/Details
Bytecode → TAC Static analysis, stack deconstruction, control flow
TAC Representation Three-operand instructions, explicit data/control flows
TAC → Source Fine-tuned LLM, prompt engineering, context preservation

Proof-producing binary lifters generate TAC accompanied by formal proofs, enabling property transfer to binaries. Neural decompilation pipelines, as implemented at https://evmdecompiler.com, use TAC as a basis for LLM-based semantic lifting to high-level code, supporting practical deployment and public accessibility.

In summary, Three-Address Code constitutes a critical intermediate layer in both formal and data-driven analysis pipelines. Its adoption ensures transparency, analyzability, and support for formal correctness guarantees and has facilitated the emergence of advanced tools for program verification, decompilation, and cross-architecture software reasoning (Sound Transpilation from Binary to Machine-Independent Code, 2018, Decompiling Smart Contracts with a Large Language Model, 24 Jun 2025).