Three-Address Code (TAC)
Three-Address Code (TAC) is a canonical low-level intermediate code representation in which each instruction contains at most three operands, typically two source operands and one result. Widely adopted in compiler infrastructure and static analysis, TAC provides a structured, explicit encoding of both data and control dependencies in a program. Its utility extends across traditional compiler pipelines and contemporary program analysis systems, serving as a key intermediate form between unstructured or stack-based machine code and higher-level representations amenable to further abstraction, optimization, or translation by both conventional and machine learning-based approaches.
1. Foundations and Definition
TAC is defined by its instruction format, where each operation performs at most a single computation with up to three explicit addresses: A minimal valid TAC snippet might appear as follows:
1 2 3 |
temp1 = a + b temp2 = c - d result = temp1 * temp2 |
2. Construction and Static Program Analysis
TAC is commonly derived from either high-level source code (e.g., during compilation) or from lower-level representations such as stack-based bytecode. In decompilation pipelines or static analysis frameworks, the construction of TAC generally involves the following steps:
- Function boundary identification: Control flow analysis is employed to infer function start and end points, especially where explicit function boundaries are missing (e.g., Ethereum Virtual Machine (EVM) bytecode).
- Stack to variable lifting: Stack-based computations are converted into explicit variable assignments. Each stack operation is replaced with a corresponding TAC statement where the involved operands are mapped to distinct variables.
- Control flow structuring: The graph of jumps, calls, and branches is transformed into a structured set of labeled TAC instructions, replacing low-level jump offsets with explicit labels and conditionals.
- Type inference (where necessary): When type information is not preserved (as in EVM bytecode), context-sensitive analysis infers types based on operand usage throughout a contract or function.
For example, the process of decompiling EVM bytecode first applies static program analysis to recover data and control dependencies, then emits a structured TAC representation, as detailed in modern smart contract decompilation systems (David et al., 24 Jun 2025 ).
3. Role as an Intermediate Representation
TAC occupies a pivotal role as an intermediate representation in both compilation and decompilation pipelines. Within such workflows, its primary functions include:
- Bridging low-level and high-level representations: TAC renders stack-based or unstructured code analyzable and transformable, which is critical when translating between bytecode (such as EVM) and high-level source code (such as Solidity).
- Facilitating machine learning-based translation: As demonstrated in smart contract decompilation (David et al., 24 Jun 2025 ), a LLM is fine-tuned on aligned TAC-to-Solidity pairs, leveraging TAC's explicitness to recover high-level constructs, variable names, and data/control flow for code generation.
- Abstracting semantic structure: TAC enables exposure of data and control dependencies—facilitating subsequent analyses, optimizations, or conversions that require insight into the semantics of computation rather than syntactic details alone.
A typical pipeline utilizing TAC in decompilation is as follows:
1 2 3 4 5 6 7 8 9 |
EVM Bytecode | v [Static Analysis + Control Flow Recovery] | v Three-Address Code (TAC) | v [Fine-tuned LLM (e.g., Llama-3.2-3B)] | v Readable Solidity |
4. Comparative Context: TAC versus Alternative Representations
TAC is contrasted with both lower-level and higher-level IRs:
- Stack-based bytecode: Data dependencies in stack-based representations are implicit, encoded via stack manipulation. TAC makes dependencies explicit by naming all intermediate values, greatly aiding analysis and translation.
- High-level IR (e.g., SSA-based LLVM IR): More expressive IRs such as Static Single Assignment (SSA) form expose even finer granularity in data dependencies and facilitate advanced transformations (as in inst2vec (Ben-Nun et al., 2018 )). Nevertheless, TAC is often favored for its balance between simplicity and analyzability.
- Tree or graph-based semantic representations: These offer richer contextualization—e.g., the inst2vec Contextual Flow Graph (XFG) explicitly encodes both data and control flow relationships beyond sequential adjacency. TAC's context is primarily implicit, and analysis must reconstruct broader relations from the sequential TAC or control flow graph.
A summary of distinctions:
Representation | Data Flow Explicitness | Control Flow Structure | ML Readiness |
---|---|---|---|
Stack-based bytecode | Implicit | Flat/unstructured | Poor |
TAC (Three-Address) | Explicit | Linear/labels | Moderate |
SSA-based IR/XFG | Explicit (SSA/XFG) | Graph-structured | High (embeddings) |
5. Strengths, Limitations, and Practical Implications
Strengths:
- Explicitness. Offers clear variable-level view of computations and control transitions.
- Amenability to analysis. Well-suited for data flow, control flow, and type analyses; facilitates code transformation.
- Bridging capability. Serves as an effective interface between stack-based computation and source-like high-level code.
Limitations:
- Semantic abstraction ceiling. Certain higher-level semantics (e.g., idiomatic code structure, variable naming, deep type hierarchies) are not fully recoverable from TAC without augmentation.
- Context limitation. Sequential or block-local context; broader semantic relationships, such as those captured by XFG in inst2vec (Ben-Nun et al., 2018 ), are not encoded.
- Dependency on static analysis. When derived from highly optimized or obfuscated bytecode, TAC may inherit ambiguity or lose higher-level intent.
As an illustrative example, smart contract decompilers enhanced with TAC can recover explicit control and data flows, ultimately achieving high agreement in semantics and readability with original source—measured, for instance, by an average semantic similarity of 0.82 and strong preservation of key constructs (David et al., 24 Jun 2025 ). A plausible implication is that the explicit TAC form is essential for accurate LLM-guided decompilation, enabling state-of-the-art performance on source code reconstruction from opaque bytecode.
6. Real-world Applications and Empirical Findings
TAC-based pipelines are deployed in several critical analysis and synthesis domains:
- Source code recovery for security analysis: For the majority of deployed Ethereum contracts lacking verified source, TAC enables static lifting of bytecode to an analyzable form. Subsequently, LLMs reconstruct human-readable source, facilitating vulnerability discovery and incident response (David et al., 24 Jun 2025 ).
- Machine learning for code understanding: TAC has been used as input to learned representations and LLMs trained on paired TAC-to-source datasets, providing the model with a level of abstraction that is neither too granular (as with bytecode) nor too artifact-dependent (as with raw source).
- Auditing and compliance: Auditors can reconstruct, scrutinize, and validate the mechanics of unverified on-chain contracts by analyzing the TAC-derived and LLM-translated source.
- Legacy applications: TAC remains a standard IR in optimizing compilers and code generation, supporting classical analyses such as liveness, reaching definitions, and alias analysis.
Empirical studies indicate that with a robust TAC extraction pipeline and LLM fine-tuning, generated code achieves high similarity and readability, sharply outperforming traditional decompilers—e.g., 78% of functions exceeding 0.8 semantic similarity, and functional constructs such as require
, msg.sender
, and array indexing being recovered with fidelity.
7. Summary Table: TAC in Decompilation and Analysis
Aspect | Details/Findings |
---|---|
Structure | 3-operand per line, explicit data flow, and labeled control transfers |
Typical pipeline role | Intermediary between bytecode and source or high-level IR |
Analytic strengths | Enables data/control flow analysis, code transformation, LLM-assisted semantic code synthesis |
Core applications | Compilation, decompilation, security auditing, code comprehension, ML-based code generation |
Empirical results | ~0.82 semantic similarity with original source in decompiled contracts (David et al., 24 Jun 2025 ) |
TAC remains a foundational representation bridging efficiency in automated analysis and richness in human interpretability, with demonstrated impact in both classic and contemporary program comprehension tasks.