RPHunter: Advanced Rug Pull Detection
- RPHunter is an advanced detection system that integrates static code and dynamic transaction analysis to identify rug pull scams with high precision.
- It employs graph-based flow analysis and neural embedding to extract semantic code risk features and capture nuanced token transfer behaviors.
- Evaluations show improved performance with a 94.5% F1 score, low false positive rates, and timely real-world detection on Ethereum token contracts.
RPHunter is an advanced detection system designed to identify rug pull scams in crypto token ecosystems by fusing static code analysis with dynamic transaction behavior modeling. By integrating a graph-based flow analysis of smart contract code with a granular model of token transfer activity, RPHunter achieves early and high-precision rug pull detection and outperforms existing static analysis, transaction-pattern, and hybrid approaches (Wu et al., 23 Jun 2025).
1. Formalization of Rug Pull Code Risk
RPHunter begins by extracting latent code risks from token contracts through a multi-stage pipeline:
- Bytecode Acquisition and Decompilation: For a given token contract address, on-chain bytecode is retrieved (Web3.getCode) and decompiled into an intermediate representation (IR) via Gigahorse. The IR encodes program structure as a set of basic blocks, with explicit control-flow and data-flow edges. This yields a Semantic Code Graph (SCG), , where represents basic blocks and , represent control- and data-flow, respectively.
- Declarative Relations and Rules: A series of first-order relations is defined over code statements , functions , variables , and mappings . For example:
- : is data-flow dependent on 0
- 1: 2 is a public function with parameter 3
- 4: 5 is a transaction-tax manipulated in 6
- Risk Typology: Rug pull risks are partitioned into three core classes—Sale Restrict, Variable Manipulation, and Balance Tamper—with eight specialized subtypes. Each sub-risk 7 is formalized by a conjunction 8, pairing a broad code-pattern (9) with plugin predicates (0) for subtype refinement.
- Flow Analysis: Declarative rules are traversed via a data and control-flow taint-propagation algorithm. For each 1, blocks matching 2 are "tainted," with propagation through 3; satisfaction of 4 records critical blocks and flows, defining risk evidence sets 5.
2. Token Flow Behavior Modeling
- Transaction Event Extraction: For each token, the first 6 transfer events are collected, capturing sender 7, receiver 8, normalized value 9, and timestamp 0.
- Token Flow Behavior Graph (TFBG): The TFBG is defined as 1, where 2 is the set of observed accounts and 3 the sequence of transfer edges (multiple per sender–receiver–time triple). Node and edge features reflect:
- Node (14D): Network centrality, in/out degrees, clustering, token creator flags, fund flow ratios, short-term activity.
- Edge (15D): Time series (intervals, approvals), transaction (gas, value, harmonic value), and investment features (cumulative volumes, short-term maxima).
The TFBG structure is designed to expose both network structural anomalies and market manipulation signatures typical of rug pulls.
3. Joint Graph Representation and Neural Embedding
RPHunter constructs two heterogeneous graphs for each token:
- Semantic Risk Code Graph (SRCG): Nodes corresponding to code blocks are labeled ("critical," "invocation," "normal") based on rule activation and call sites, with edges labeled by criticality and dependency.
- TFBG: Encodes dynamic transfer behaviors as described above.
Graph Embedding Architectures
- Relational GCN for SRCG: Each relation type 4 (critical, dependent, normal) defines adjacency 5, with propagation:
6
where node features are BERT-based opcode embeddings.
- Unified Aggregation GNN (UAGNN) for TFBG: Alternates node and edge updates:
- Node: GCN-style message passing.
- Edge: Each edge updates via local node states and mean aggregation over temporally preceding, incident edges.
- After 2 rounds, mean-pooling yields global embeddings 7 and 8.
- Attention Fusion: Both embeddings are projected to 9, with bidirectional attention weights 0, yielding a fused 1 for final MLP-based binary prediction.
4. Dataset Construction and Evaluation
- Dataset Source: 1048 publicly reported rug pull incidents from 8 security platforms; after removal of code/incomplete data, 645 cases remain. 1806 manually reviewed benign tokens from TokenScout yield 1675 negatives.
- Split and Validation: Dataset split 60:20:20 among train/val/test; within training, five-fold cross-validation is used for robustness.
Metrics and Results
- Binary classification metrics:
2
RPHunter achieves, on held-out test folds: - Precision: 95.3% - Recall: 93.8% - F1 Score: 94.5% - FPR: 1.8% - FNR: 6.2%
Performance exceeds two rule-based baselines (Pied-Piper, CRPWarner), two transaction-only models, and the commercial scanner GoPlus.
5. Deployment and Case Studies
- Ethereum Mainnet Scan: RPHunter was deployed across all ERC-20-like contracts between blocks 19,771,560–20,207,949 (May–July 2024), flagging 4,801 tokens as rug pulls.
- Real-world Precision: Manual sampling of 247 detections found 23 false positives, giving a live precision of approximately 91%.
- Timeliness: Fast rug pulls were detected—108 flagged tokens existed <24h; in a notable case (MNHA), RPHunter identified the risk prior to withdrawal, preempting loss.
6. Limitations and Component Analysis
- Flow Analysis Quality: Tied to Gigahorse decompilation fidelity; integrating more precise analyzers (e.g., Vandal or Mythril) could reduce false negatives.
- Dataset Bias: Current positives are high-confidence, but adaptive scam tactics may evade static rule sets; continual rule/plugin refreshes are required.
- Off-Chain/Multimodal Blind Spots: Off-chain activities and social/contractual "trust pulls" (not reflected on-chain) escape current detection; fusing social and multi-modal signals is suggested as future work.
- Ablation Insights:
- Removing code-risk features: F1 drops ≈6.7%.
- Excluding node/edge features in TFBG: F1 drops by up to 11.2%.
- Omission of fusion model: F1 decreases by up to 18.5%.
7. Impact, Novelty, and Future Directions
RPHunter establishes the paradigm of graph- and attention-based fusion of (i) formally defined code-risk features from declarative flow analysis and (ii) transaction-behavioral features. This enables high sensitivity and specificity in real-world, early-stage rug pull detection within heterogeneous, adversarial token environments. High F1 and operational precision demonstrate practical efficacy. Future directions include integration of higher-fidelity code analyzers, expansion of rule/plugin libraries to adapt to new attack patterns, and incorporation of off-chain data sources to address non-code-based scams (Wu et al., 23 Jun 2025).