Decompiling Smart Contracts with a Large Language Model (2506.19624v1)

Published 24 Jun 2025 in cs.CR

Abstract: The widespread lack of broad source code verification on blockchain explorers such as Etherscan, where despite 78,047,845 smart contracts deployed on Ethereum (as of May 26, 2025), a mere 767,520 (< 1%) are open source, presents a severe impediment to blockchain security. This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode, a fundamental research challenge with direct implications for identifying vulnerabilities and understanding malicious behavior. Prevailing decompilers struggle to reverse bytecode in a readable manner, often yielding convoluted code that critically hampers vulnerability analysis and thwarts efforts to dissect contract functionalities for security auditing. This paper addresses this challenge by introducing a pioneering decompilation pipeline that, for the first time, successfully leverages LLMs to transform Ethereum Virtual Machine (EVM) bytecode into human-readable and semantically faithful Solidity code. Our novel methodology first employs rigorous static program analysis to convert bytecode into a structured three-address code (TAC) representation. This intermediate representation then guides a Llama-3.2-3B model, specifically fine-tuned on a comprehensive dataset of 238,446 TAC-to-Solidity function pairs, to generate high-quality Solidity. This approach uniquely recovers meaningful variable names, intricate control flow, and precise function signatures. Our extensive empirical evaluation demonstrates a significant leap beyond traditional decompilers, achieving an average semantic similarity of 0.82 with original source and markedly superior readability. The practical viability and effectiveness of our research are demonstrated through its implementation in a publicly accessible system, available at https://evmdecompiler.com.

Summary

The paper presents a hybrid decompilation pipeline that combines static analysis and an LLM fine-tuned on TAC-to-Solidity pairs to achieve high semantic fidelity.
It demonstrates strong empirical results with a mean semantic similarity of 0.82 and reduced edit distance compared to traditional decompilers.
The released dataset and public decompilation service offer valuable resources for enhancing blockchain security and automated smart contract audits.

Decompiling Smart Contracts with a LLM: An Expert Overview

This paper presents a comprehensive and technically rigorous approach to the decompilation of Ethereum smart contracts, introducing a hybrid pipeline that leverages both static program analysis and LLMs to translate EVM bytecode into human-readable, semantically faithful Solidity code. The work addresses a critical gap in blockchain security and transparency, given that less than 1% of deployed Ethereum contracts are open source, leaving the vast majority of on-chain logic opaque and challenging to audit.

Technical Contributions

The core contributions of the paper are as follows:

Hybrid Decompilation Pipeline: The system first applies static analysis to convert EVM bytecode into a structured three-address code (TAC) intermediate representation. This step is crucial for bridging the semantic gap between low-level bytecode and high-level source code, enabling more effective downstream processing.
LLM-based Code Generation: A Llama-3.2-3B model, fine-tuned on a large dataset of 238,446 TAC-to-Solidity function pairs, is used to generate Solidity code from the TAC. The model is adapted using Low-Rank Adaptation (LoRA), which allows efficient fine-tuning with a relatively small number of additional parameters.
Empirical Evaluation: The system is evaluated on a held-out test set of 9,731 smart contract functions, demonstrating a mean semantic similarity of 0.82 to the original source code and outperforming traditional decompilers in both readability and semantic preservation.
Public Dataset and System: The authors release both their dataset and a public-facing decompilation service, facilitating further research and practical adoption.

Methodological Details

The pipeline is structured as follows:

Bytecode to TAC Conversion: Static analysis is used to recover control flow, function boundaries, and data flow, producing a TAC representation that is more amenable to neural processing than raw bytecode.
Dataset Construction: Verified contracts with available source code are used to create aligned TAC-Solidity pairs, with careful normalization and filtering to ensure data quality and coverage of diverse Solidity idioms.
Model Training: The Llama-3.2-3B model is fine-tuned using LoRA, targeting key transformer components. Training employs gradient checkpointing and sequence length management to handle long smart contract functions.
Code Generation and Post-processing: The model generates Solidity code, which is then validated for syntactic correctness and semantic plausibility.

Empirical Results

The evaluation is multi-faceted, focusing on semantic preservation, code structure, and practical usability:

Semantic Similarity: 78.3% of decompiled functions achieve a semantic similarity above 0.8, with 45.2% exceeding 0.9. This is a substantial improvement over traditional decompilers, which typically achieve such scores for only 40–50% of functions.
Edit Distance: 82.5% of functions have a normalized edit distance below 0.4, indicating close syntactic alignment with the original code.
Token-Level Analysis: The frequency of key Solidity constructs (e.g., require, msg.sender, type specifiers) is preserved within 2% of the original, demonstrating the model's ability to maintain security-critical patterns.
Case Studies: The system accurately reconstructs complex control flow and memory management in NFT enumeration functions, while limitations are observed in highly specialized DeFi reward calculations, particularly those involving intricate fixed-point arithmetic and nested storage patterns.
Ablation Study: Fine-tuning is shown to be essential; the base Llama-3.2-3B model exhibits a 45% drop in semantic similarity and fails to capture domain-specific patterns without adaptation.

Security and Practical Applications

The system's practical utility is demonstrated through real-world case studies:

Vulnerability Discovery: The decompiler exposes a critical reentrancy vulnerability in the unverified Dx Protocol contract, which could have led to repeated unauthorized withdrawals.
MEV Bot Analysis: The tool successfully reconstructs the logic of a proprietary MEV bot, revealing arbitrary external call and unprotected token transfer vulnerabilities that were exploited in the wild.

These applications underscore the system's value for security auditing, incident response, and automated verification in the context of opaque, unverified smart contracts.

Theoretical and Practical Implications

The paper's findings have several important implications:

Hybrid Approaches: The combination of static analysis and LLMs is shown to be more effective than either approach alone, particularly for tasks requiring both semantic fidelity and human readability.
Model Size vs. Specialization: The success of a 3B-parameter model, when properly fine-tuned, challenges the assumption that only very large models are suitable for complex code understanding tasks. Domain-specific adaptation is shown to be more critical than raw model scale.
Intermediate Representations: The use of TAC as an intermediate step is validated as a general strategy for bridging low-level and high-level code representations, with potential applicability to other domains beyond EVM decompilation.
Entropy Analysis: The paper provides a quantitative analysis of the entropy of Solidity, TAC, and EVM bytecode, highlighting the information loss and redundancy at each stage. This informs both the challenges and opportunities in decompilation and code translation tasks.

Limitations and Future Directions

While the system achieves strong results, several limitations are acknowledged:

Complex DeFi Patterns: The model struggles with highly specialized financial logic, particularly where fixed-point arithmetic and complex storage patterns are involved.
Inline Assembly: Functions containing inline assembly or unusual compiler optimizations are more likely to be decompiled into verbose or less idiomatic code.
Function Length: Very long functions (>1,000 characters) exhibit increased variance in decompilation quality, primarily in variable naming and control flow structuring.

Future research directions include:

Extending to Other VM Architectures: The hybrid approach could be adapted to other blockchain VMs or even traditional binary decompilation tasks.
Improved Type and Structure Recovery: Enhancing the recovery of complex types, inheritance hierarchies, and storage patterns remains an open challenge.
Integration with Automated Auditing Tools: Combining high-fidelity decompilation with automated vulnerability detection could further streamline security workflows.

Conclusion

This work establishes a new technical standard for smart contract decompilation, demonstrating that the integration of static analysis and LLMs can yield outputs that are both semantically accurate and highly readable. The approach has immediate practical relevance for blockchain security, transparency, and maintainability, and its methodological innovations are likely to influence future research in program analysis, code translation, and AI-assisted software engineering. The public release of both the dataset and the decompilation service further amplifies its impact, providing valuable resources for the research and practitioner communities.

PDF Markdown

Related Papers

Tweets

https://twitter.com/KaihuaQIN/status/1937775895252512780

https://twitter.com/fin_tech/status/1937724979384320327