Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 83 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 109 tok/s Pro
Kimi K2 181 tok/s Pro
GPT OSS 120B 468 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

SmartInv: Multimodal Learning for Smart Contract Invariant Inference (2411.09217v1)

Published 14 Nov 2024 in cs.SE, cs.CR, and cs.PL

Abstract: Smart contracts are software programs that enable diverse business activities on the blockchain. Recent research has identified new classes of "machine un-auditable" bugs that arise from both transactional contexts and source code. Existing detection methods require human understanding of underlying transaction logic and manual reasoning across different sources of context (i.e. modalities), such as code, dynamic transaction executions, and natural language specifying the expected transaction behavior. To automate the detection of machine un-auditable'' bugs, we present SmartInv, an accurate and fast smart contract invariant inference framework. Our key insight is that the expected behavior of smart contracts, as specified by invariants, relies on understanding and reasoning across multimodal information, such as source code and natural language. We propose a new prompting strategy to foundation models, Tier of Thought (ToT), to reason across multiple modalities of smart contracts and ultimately to generate invariants. By checking the violation of these generated invariants, SmartInv can identify potential vulnerabilities. We evaluate SmartInv on real-world contracts and re-discover bugs that resulted in multi-million dollar losses over the past 2.5 years (from January 1, 2021 to May 31, 2023). Our extensive evaluation shows that SmartInv generates (3.5X) more bug-critical invariants and detects (4$\times$) more critical bugs compared to the state-of-the-art tools in significantly (150X) less time. \sys uncovers 119 zero-day vulnerabilities from the 89,621 real-world contracts. Among them, five are critical zero-day bugs confirmed by developers ashigh severity.''

Citations (9)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces SmartInv, a multimodal framework that infers invariants to detect both functional and implementation bugs in smart contracts.
  • It employs a novel Tier of Thought prompting strategy that decomposes invariant inference into stages of context identification, candidate generation, and vulnerability ranking.
  • Evaluations on 89,621 contracts demonstrate a 4× bug detection improvement and a 150× speedup over traditional analysis tools.

SmartInv: Multimodal Learning for Smart Contract Invariant Inference

Introduction and Motivation

Smart contracts, as immutable programs deployed on blockchains, are highly susceptible to vulnerabilities that can result in significant financial losses. While traditional static and dynamic analysis tools have made progress in detecting implementation bugs (e.g., integer overflows, reentrancy), they are fundamentally limited in their ability to identify "machine un-auditable" functional bugs. These functional bugs arise from complex, domain-specific transactional contexts and require reasoning across multiple modalities, such as source code and natural language documentation. The SmartInv framework addresses this gap by leveraging foundation models and a novel prompting strategy to infer invariants that capture the intended behavior of smart contracts, enabling the detection of both implementation and functional bugs at scale.

Multimodal Invariant Inference: Problem Formulation

The core challenge addressed by SmartInv is the automated inference of invariants that specify the expected behavior of smart contracts, using information from both code and natural language. Formally, given a pre-trained foundation model MθM_\theta and a tokenized smart contract SS, the goal is to generate pairs (ci,vi)(c_i, v_i), where cic_i is a critical program point and viv_i is an associated invariant. These invariants are then used to detect vulnerabilities by checking for their violation during symbolic or bounded model checking.

SmartInv introduces three primary types of invariants:

  • Assertions: Boolean expressions over program variables, possibly involving special constructs such as Old(expr) (pre-state value) and SumMapping(mappingVar) (aggregate over mappings).
  • Modifiers: Function-level invariants, e.g., onlyOwner, specifying access control or other global properties.
  • Global Invariants: Properties that span multiple functions or contracts, enabling reasoning about cross-contract or cross-function behaviors.

Tier of Thought (ToT): Multimodal Prompting and Reasoning

A key innovation in SmartInv is the Tier of Thought (ToT) prompting strategy, which structures the invariant inference process into a sequence of increasingly complex reasoning steps. This approach is inspired by the observation that human auditors reason about smart contracts in stages: identifying transactional context, locating critical program points, generating candidate invariants, ranking them by bug-preventive potential, and finally mapping them to specific vulnerabilities.

The ToT process is implemented as a multi-tiered prompt sequence:

  1. Tier 1: Identify transactional context and critical program points using both code and natural language cues.
  2. Tier 2: Generate candidate invariants at the identified program points, leveraging multimodal information.
  3. Tier 3: Rank the invariants by their likelihood of preventing bugs and predict the types of vulnerabilities present.

This staged reasoning is critical for effective multimodal learning, as it allows the model to decompose complex tasks and utilize both code structure and natural language documentation. Figure 1

Figure 1: Chain of Thought prompting enables stepwise reasoning from code and comments to invariants and vulnerabilities.

SmartInv Workflow and System Architecture

The SmartInv workflow consists of the following stages:

  1. Finetuning: The foundation model is finetuned on a curated dataset of annotated smart contracts, with ToT prompts and ground truth labels for transactional context, critical program points, invariants, and vulnerabilities.
  2. Inference: For a new contract, the finetuned model is prompted using the ToT strategy to generate ranked invariants.
  3. Verification: The generated invariants are verified using a combination of inductive proof (e.g., via Boogie) and bounded model checking. Invariants that cannot be proven or falsified are flagged for manual inspection.
  4. Reporting: Verified invariants and detected vulnerabilities are reported, enabling both automated and human-in-the-loop auditing. Figure 2

    Figure 2: SmartInv's Workflow, integrating multimodal finetuning, staged inference, and verification.

Implementation and Evaluation

SmartInv is implemented in Python, with the verification backend built atop VeriSol. The system is evaluated on a large-scale dataset of 89,621 real-world Solidity contracts, with additional curated datasets for training and validation. The evaluation covers four key research questions:

  • RQ1: Bug Detection Effectiveness SmartInv detects 4× more critical bugs and generates 3.5× more bug-critical invariants than state-of-the-art tools, with a 10.39% false positive rate—substantially lower than other prompting-based or symbolic tools. Notably, SmartInv uncovers 119 zero-day vulnerabilities, five of which are confirmed as high severity.
  • RQ2: Invariant Generation Quality SmartInv produces fewer but more semantically meaningful invariants per contract compared to dynamic invariant detectors, with a lower false positive rate (0.32 per contract vs. 5.41 for InvCon).
  • RQ3: Ablation and Model Selection The LLaMA-7B model, finetuned with ToT and multimodal data, achieves the highest accuracy (0.89) and F1 score (0.82). Removing natural language cues or ToT prompting significantly degrades performance, confirming the necessity of both modalities and staged reasoning.
  • RQ4: Runtime Performance SmartInv achieves a 150× speedup over symbolic and static analysis tools, with average per-contract analysis times of 28.6 seconds, enabling practical large-scale deployment.

Case Studies: Zero-Day Vulnerabilities

SmartInv's practical impact is demonstrated through the discovery of real-world zero-day bugs:

  • Cross-Bridge Default Value Bug: SmartInv infers an invariant assert(_msgHash != 0) to prevent acceptance of default-initialized message hashes, which could otherwise allow unauthorized cross-chain operations.
  • Gas Inefficiency in Deposit Queues: By generating require(depositQueue.size() == 1), SmartInv identifies a denial-of-service vector where excessive queue length can lock user funds due to gas exhaustion.

These cases illustrate SmartInv's ability to reason about subtle, context-dependent vulnerabilities that elude pattern-based or purely code-centric tools.

Limitations and Future Directions

While SmartInv demonstrates strong empirical performance, several limitations remain:

  • Token Length Constraints: Even with finetuning, foundation models are limited in the size of contracts they can process. Summarization of imported modules is used as a workaround, but further architectural advances are needed for very large contracts.
  • Hallucination and Verification: Foundation models are prone to hallucination; thus, the verification phase is essential to filter out spurious invariants. However, some invariants may remain unverifiable due to limitations in current formal verification tools.
  • Transaction History: Incorporating transaction history as an additional modality can further improve precision, especially for deployed contracts, but is not always available pre-deployment.

Future work may explore more scalable model architectures, improved integration of dynamic execution traces, and broader support for cross-chain and cross-language contracts.

Implications and Theoretical Significance

SmartInv demonstrates that multimodal learning, when combined with staged prompting and formal verification, can substantially advance the state of the art in smart contract analysis. The approach generalizes beyond hand-crafted patterns, enabling detection of previously unrecognized classes of functional bugs. The ToT strategy provides a template for structured reasoning in other domains where code and natural language interact, such as API misuse detection or regulatory compliance.

The strong numerical results—orders-of-magnitude improvements in both detection rate and runtime—suggest that foundation models, when properly finetuned and guided, can serve as effective assistants for program analysis tasks that require deep semantic understanding.

Conclusion

SmartInv represents a significant advance in automated smart contract analysis, uniting multimodal foundation models, staged reasoning via Tier of Thought prompting, and formal verification. The system achieves superior performance in both invariant inference and bug detection, particularly for functional bugs that have historically eluded automated tools. The methodology and empirical results have broad implications for the application of LLMs in program analysis, formal methods, and software security.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.