BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection (2303.18138v2)

Published 29 Mar 2023 in cs.CR and cs.LG

Abstract: As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.

Citations (27)

View on Semantic Scholar

Summary

The paper introduces BERT4ETH, a pre-trained Transformer that reduces repetitiveness, mitigates skew, and models heterogeneity for superior fraud detection.
It employs innovative pre-training strategies, including high masking, frequency-aware negative sampling, and in/out sequence separation to capture transaction patterns.
Experiments demonstrate over 20 percentage point improvements in phishing detection and enhanced de-anonymization, setting a new benchmark for blockchain fraud analysis.

BERT4ETH: Enhancing Ethereum Fraud Detection with Pre-trained Transformers

The research paper "BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection" presents a new pre-trained Transformer model, BERT4ETH, tailored to improve the detection of fraud on the Ethereum blockchain. This platform, while revolutionizing decentralized applications, is also a breeding ground for various frauds, including phishing scams, Ponzi schemes, pump-and-dump schemes, and more. The paper identifies limitations in current graph-based approaches for fraud detection in Ethereum and proposes BERT4ETH as a superior alternative.

Core Contributions

Model Architecture and Features: BERT4ETH utilizes the Transformer encoder to process and model Ethereum transactions, capturing dynamic sequential patterns essential for fraud detection. The architecture adapts the BERT model's pre-training strategies specifically for Ethereum by reducing repetitiveness, alleviating skew, and modeling heterogeneity effectively. This approach allows BERT4ETH to handle the high repetitiveness, skewed distribution, and heterogeneity of Ethereum transaction data better than traditional graph-based methods.
Innovative Pre-training Strategies:

The paper introduces innovative strategies to address Ethereum-specific challenges: - Repetitiveness Reduction: This involves transaction de-duplication and strategies such as high masking or dropout ratios to mitigate label leakage problems. - Skew Alleviation: Techniques like frequency-aware negative sampling and intra-batch sharing enhance the representational distinctiveness by reducing the negative impact of high-frequency addresses. - Heterogeneity Modeling: Advanced techniques such as in/out sequence separation and ERC-20 transfer log encoding are employed to preserve transaction heterogeneity.

Performance Validation:

Extensive experiments validate BERT4ETH's performance on two crucial tasks in Ethereum fraud detection—phishing account detection and de-anonymization. The model shows substantial improvements over the state-of-the-art methods: - Phishing Detection: Achieves significant $F_1$ score improvements, outperforming existing methods by over 20 absolute percentage points. - De-anonymization: Demonstrates exceptional performance, particularly in the Ethereum Name Service (ENS) and Tornado datasets, with notable increases in Hit Ratio@1.

Experimental Insights and Case Studies: The ablation studies and case analysis within the paper reveal that BERT4ETH can effectively capture multi-hop neighborhood information, essential for tasks requiring a broader context beyond immediate transactional data. The use of advanced strategies helps in filtering and emphasizing important signals within complex transaction data, proving especially beneficial for identifying subtle fraud patterns.

Implications and Future Work

The research illustrates the potential of adapting machine learning advances, such as BERT-like architectures, to niche domains like blockchain fraud detection. BERT4ETH not only sets a new benchmark in Ethereum fraud detection but also opens avenues for its application across other blockchain ecosystems facing similar challenges. Future work could aim at further tuning the model to efficiently address emerging fraud techniques and exploring its applicability in other similar decentralized platforms. Adapting BERT4ETH to real-time data and scaling it for higher-dimensional blockchain networks might be challenging yet compelling directions for subsequent research.

Ultimately, BERT4ETH exemplifies the potential synergy between natural language processing techniques and blockchain technology, significantly advancing the capabilities in safeguarding Ethereum against fraud.

PDF Markdown

Related Papers

GitHub

GitHub - git-disl/BERT4ETH: BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection (WWW23) (93 stars)

Tweets

https://twitter.com/BitBiblio/status/1756728647279362503