Decompiling Smart Contracts with a Large Language Model (2506.19624v1)

Published 24 Jun 2025 in cs.CR

Abstract: The widespread lack of broad source code verification on blockchain explorers such as Etherscan, where despite 78,047,845 smart contracts deployed on Ethereum (as of May 26, 2025), a mere 767,520 (< 1%) are open source, presents a severe impediment to blockchain security. This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode, a fundamental research challenge with direct implications for identifying vulnerabilities and understanding malicious behavior. Prevailing decompilers struggle to reverse bytecode in a readable manner, often yielding convoluted code that critically hampers vulnerability analysis and thwarts efforts to dissect contract functionalities for security auditing. This paper addresses this challenge by introducing a pioneering decompilation pipeline that, for the first time, successfully leverages LLMs to transform Ethereum Virtual Machine (EVM) bytecode into human-readable and semantically faithful Solidity code. Our novel methodology first employs rigorous static program analysis to convert bytecode into a structured three-address code (TAC) representation. This intermediate representation then guides a Llama-3.2-3B model, specifically fine-tuned on a comprehensive dataset of 238,446 TAC-to-Solidity function pairs, to generate high-quality Solidity. This approach uniquely recovers meaningful variable names, intricate control flow, and precise function signatures. Our extensive empirical evaluation demonstrates a significant leap beyond traditional decompilers, achieving an average semantic similarity of 0.82 with original source and markedly superior readability. The practical viability and effectiveness of our research are demonstrated through its implementation in a publicly accessible system, available at https://evmdecompiler.com.

View on arXiv

Authors (5)

Isaac David (5 papers)
Liyi Zhou (20 papers)
Dawn Song (229 papers)
Arthur Gervais (32 papers)
Kaihua Qin (22 papers)

Summary

Decompiling Smart Contracts with a LLM: An Expert Overview

This paper presents a comprehensive and technically rigorous approach to the decompilation of Ethereum smart contracts, introducing a hybrid pipeline that leverages both static program analysis and LLMs to translate EVM bytecode into human-readable, semantically faithful Solidity code. The work addresses a critical gap in blockchain security and transparency, given that less than 1% of deployed Ethereum contracts are open source, leaving the vast majority of on-chain logic opaque and challenging to audit.

Technical Contributions

The core innovation lies in a multi-stage decompilation pipeline:

Static Analysis and Intermediate Representation: EVM bytecode is first converted into a structured three-address code (TAC) representation. This step employs advanced control flow and data flow analysis to recover function boundaries, variable usage, and high-level control structures, mitigating the semantic loss inherent in stack-based bytecode.
LLM-based Code Generation: A Llama-3.2-3B model, fine-tuned via LoRA on a dataset of 238,446 TAC-to-Solidity function pairs, translates the TAC into Solidity. The model is specifically adapted to the domain, enabling it to recover meaningful variable names, function signatures, and idiomatic Solidity constructs.
Post-processing and Validation: The generated Solidity is validated for syntactic correctness and semantic alignment with the original bytecode, ensuring practical usability for auditing and maintenance.

The system is implemented as a publicly accessible tool (https://evmdecompiler.com), providing immediate utility for security researchers and auditors.

Empirical Evaluation

The evaluation is methodologically robust, utilizing a held-out test set of 9,731 smart contract functions spanning a wide range of complexity and application domains. Key findings include:

Semantic Similarity: The system achieves an average semantic similarity of 0.82 with original source code, with 78.3% of functions exceeding 0.8 and 45.2% exceeding 0.9. This is a substantial improvement over traditional decompilers, which typically achieve such scores for only 40–50% of functions.
Edit Distance: 82.5% of decompiled functions have a normalized edit distance below 0.4, indicating strong syntactic preservation.
Token-Level Fidelity: The frequency of security-critical tokens (e.g., require, msg.sender) in decompiled code matches the original within 2%, demonstrating robust preservation of safety patterns.
Case Studies: The system accurately reconstructs complex patterns such as NFT enumeration and standard interface integration. However, it exhibits limitations in highly optimized DeFi contracts, particularly those involving intricate fixed-point arithmetic and temporal logic.

Ablation and Limitations

An ablation paper comparing the fine-tuned model to the base Llama-3.2-3B demonstrates a 45% drop in semantic similarity without domain-specific adaptation, underscoring the necessity of specialized training for smart contract decompilation. The system's performance degrades for very long functions (>1,000 characters) and contracts with complex inheritance or inline assembly, though semantic similarity remains above 0.7 in these cases.

Security and Practical Applications

The practical impact is demonstrated through real-world case studies:

Vulnerability Discovery: The decompiler exposes critical flaws in unverified contracts, such as the Dx Protocol, where a state update bug could have led to repeated unauthorized withdrawals.
Incident Response: In MEV bot exploits, the tool enables post-mortem analysis by reconstructing vulnerable callback functions, revealing arbitrary external call and unprotected token transfer vulnerabilities.

These applications highlight the system's value in security auditing, incident response, and automated contract verification, directly addressing the opacity that adversarial actors exploit in DeFi and MEV contexts.

Theoretical and Practical Implications

The paper provides a detailed entropy analysis of smart contract representations, quantifying the information loss from Solidity (4.22 bits/token) to EVM bytecode (6.30 bits/opcode) and positioning TAC (5.78 bits/instruction) as an effective intermediate. This analysis justifies the pipeline's design and suggests that similar hybrid approaches could be generalized to other domains where high-level semantics must be recovered from low-level representations.

The results challenge the assumption that only large, general-purpose LLMs are suitable for complex code understanding tasks. Instead, the findings support the efficacy of smaller, domain-specialized models, provided they are trained on high-quality, task-specific data and paired with structured intermediate representations.

Future Directions

Several avenues for further research and development are evident:

Enhanced Type and Structure Recovery: Improving the recovery of complex data structures, inheritance hierarchies, and precise type information remains an open challenge, particularly for contracts employing aggressive compiler optimizations or inline assembly.
Cross-VM Generalization: The hybrid approach could be extended to other blockchain VMs (e.g., WASM-based chains) or even to traditional binary decompilation tasks, leveraging domain-specific intermediate representations.
Automated Vulnerability Detection: Integrating the decompiler with automated vulnerability scanners could further streamline security workflows, enabling real-time analysis of unverified contracts.
Model Scaling and Efficiency: Exploring the trade-offs between model size, inference latency, and decompilation quality will be critical for large-scale deployment and integration into continuous monitoring systems.

Conclusion

This work establishes a new technical standard for smart contract decompilation, demonstrating that the combination of static analysis and LLMs can yield outputs that are both semantically accurate and highly readable. The approach significantly enhances the transparency and auditability of blockchain ecosystems, with direct implications for security, compliance, and the broader adoption of decentralized technologies. The public release of both the tool and the training dataset further catalyzes research in neural decompilation and program understanding, providing a foundation for future advances in AI-driven code analysis.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/KaihuaQIN/status/1937775895252512780

https://twitter.com/fin_tech/status/1937724979384320327