- The paper presents a hybrid decompilation pipeline that combines static analysis and an LLM fine-tuned on TAC-to-Solidity pairs to achieve high semantic fidelity.
- It demonstrates strong empirical results with a mean semantic similarity of 0.82 and reduced edit distance compared to traditional decompilers.
- The released dataset and public decompilation service offer valuable resources for enhancing blockchain security and automated smart contract audits.
Decompiling Smart Contracts with a LLM: An Expert Overview
This paper presents a comprehensive and technically rigorous approach to the decompilation of Ethereum smart contracts, introducing a hybrid pipeline that leverages both static program analysis and LLMs to translate EVM bytecode into human-readable, semantically faithful Solidity code. The work addresses a critical gap in blockchain security and transparency, given that less than 1% of deployed Ethereum contracts are open source, leaving the vast majority of on-chain logic opaque and challenging to audit.
Technical Contributions
The core contributions of the paper are as follows:
- Hybrid Decompilation Pipeline: The system first applies static analysis to convert EVM bytecode into a structured three-address code (TAC) intermediate representation. This step is crucial for bridging the semantic gap between low-level bytecode and high-level source code, enabling more effective downstream processing.
- LLM-based Code Generation: A Llama-3.2-3B model, fine-tuned on a large dataset of 238,446 TAC-to-Solidity function pairs, is used to generate Solidity code from the TAC. The model is adapted using Low-Rank Adaptation (LoRA), which allows efficient fine-tuning with a relatively small number of additional parameters.
- Empirical Evaluation: The system is evaluated on a held-out test set of 9,731 smart contract functions, demonstrating a mean semantic similarity of 0.82 to the original source code and outperforming traditional decompilers in both readability and semantic preservation.
- Public Dataset and System: The authors release both their dataset and a public-facing decompilation service, facilitating further research and practical adoption.
Methodological Details
The pipeline is structured as follows:
- Bytecode to TAC Conversion: Static analysis is used to recover control flow, function boundaries, and data flow, producing a TAC representation that is more amenable to neural processing than raw bytecode.
- Dataset Construction: Verified contracts with available source code are used to create aligned TAC-Solidity pairs, with careful normalization and filtering to ensure data quality and coverage of diverse Solidity idioms.
- Model Training: The Llama-3.2-3B model is fine-tuned using LoRA, targeting key transformer components. Training employs gradient checkpointing and sequence length management to handle long smart contract functions.
- Code Generation and Post-processing: The model generates Solidity code, which is then validated for syntactic correctness and semantic plausibility.
Empirical Results
The evaluation is multi-faceted, focusing on semantic preservation, code structure, and practical usability:
- Semantic Similarity: 78.3% of decompiled functions achieve a semantic similarity above 0.8, with 45.2% exceeding 0.9. This is a substantial improvement over traditional decompilers, which typically achieve such scores for only 40–50% of functions.
- Edit Distance: 82.5% of functions have a normalized edit distance below 0.4, indicating close syntactic alignment with the original code.
- Token-Level Analysis: The frequency of key Solidity constructs (e.g.,
require
, msg.sender
, type specifiers) is preserved within 2% of the original, demonstrating the model's ability to maintain security-critical patterns.
- Case Studies: The system accurately reconstructs complex control flow and memory management in NFT enumeration functions, while limitations are observed in highly specialized DeFi reward calculations, particularly those involving intricate fixed-point arithmetic and nested storage patterns.
- Ablation Study: Fine-tuning is shown to be essential; the base Llama-3.2-3B model exhibits a 45% drop in semantic similarity and fails to capture domain-specific patterns without adaptation.
Security and Practical Applications
The system's practical utility is demonstrated through real-world case studies:
- Vulnerability Discovery: The decompiler exposes a critical reentrancy vulnerability in the unverified Dx Protocol contract, which could have led to repeated unauthorized withdrawals.
- MEV Bot Analysis: The tool successfully reconstructs the logic of a proprietary MEV bot, revealing arbitrary external call and unprotected token transfer vulnerabilities that were exploited in the wild.
These applications underscore the system's value for security auditing, incident response, and automated verification in the context of opaque, unverified smart contracts.
Theoretical and Practical Implications
The paper's findings have several important implications:
- Hybrid Approaches: The combination of static analysis and LLMs is shown to be more effective than either approach alone, particularly for tasks requiring both semantic fidelity and human readability.
- Model Size vs. Specialization: The success of a 3B-parameter model, when properly fine-tuned, challenges the assumption that only very large models are suitable for complex code understanding tasks. Domain-specific adaptation is shown to be more critical than raw model scale.
- Intermediate Representations: The use of TAC as an intermediate step is validated as a general strategy for bridging low-level and high-level code representations, with potential applicability to other domains beyond EVM decompilation.
- Entropy Analysis: The paper provides a quantitative analysis of the entropy of Solidity, TAC, and EVM bytecode, highlighting the information loss and redundancy at each stage. This informs both the challenges and opportunities in decompilation and code translation tasks.
Limitations and Future Directions
While the system achieves strong results, several limitations are acknowledged:
- Complex DeFi Patterns: The model struggles with highly specialized financial logic, particularly where fixed-point arithmetic and complex storage patterns are involved.
- Inline Assembly: Functions containing inline assembly or unusual compiler optimizations are more likely to be decompiled into verbose or less idiomatic code.
- Function Length: Very long functions (>1,000 characters) exhibit increased variance in decompilation quality, primarily in variable naming and control flow structuring.
Future research directions include:
- Extending to Other VM Architectures: The hybrid approach could be adapted to other blockchain VMs or even traditional binary decompilation tasks.
- Improved Type and Structure Recovery: Enhancing the recovery of complex types, inheritance hierarchies, and storage patterns remains an open challenge.
- Integration with Automated Auditing Tools: Combining high-fidelity decompilation with automated vulnerability detection could further streamline security workflows.
Conclusion
This work establishes a new technical standard for smart contract decompilation, demonstrating that the integration of static analysis and LLMs can yield outputs that are both semantically accurate and highly readable. The approach has immediate practical relevance for blockchain security, transparency, and maintainability, and its methodological innovations are likely to influence future research in program analysis, code translation, and AI-assisted software engineering. The public release of both the dataset and the decompilation service further amplifies its impact, providing valuable resources for the research and practitioner communities.