Detecting Financial Bots on the Ethereum Blockchain (2403.19530v1)
Abstract: The integration of bots in Distributed Ledger Technologies (DLTs) fosters efficiency and automation. However, their use is also associated with predatory trading and market manipulation, and can pose threats to system integrity. It is therefore essential to understand the extent of bot deployment in DLTs; despite this, current detection systems are predominantly rule-based and lack flexibility. In this study, we present a novel approach that utilizes machine learning for the detection of financial bots on the Ethereum platform. First, we systematize existing scientific literature and collect anecdotal evidence to establish a taxonomy for financial bots, comprising 7 categories and 24 subcategories. Next, we create a ground-truth dataset consisting of 133 human and 137 bot addresses. Third, we employ both unsupervised and supervised machine learning algorithms to detect bots deployed on Ethereum. The highest-performing clustering algorithm is a Gaussian Mixture Model with an average cluster purity of 82.6%, while the highest-performing model for binary classification is a Random Forest with an accuracy of 83%. Our machine learning-based detection mechanism contributes to understanding the Ethereum ecosystem dynamics by providing additional insights into the current bot landscape.
- 0age. 2019. On Efficient Ethereum Addresses. https://medium.com/coinmonks/on-efficient-ethereum-addresses-3fef0596e263
- Miners as intermediaries: extractable value and market manipulation in crypto and DeFi. (2022).
- The technology of decentralized finance (DeFi). Digital Finance (Aug. 2023). https://doi.org/10.1007/s42521-023-00088-8
- A Model for Detecting Cryptocurrency Transactions with Discernible Purpose. In 2019 Eleventh International Conference on Ubiquitous and Future Networks (ICUFN). 713–717. https://doi.org/10.1109/ICUFN.2019.8806126 ISSN: 2165-8536.
- Mikolaj Barczentewicz. 2023. MEV on Ethereum: A Policy Analysis. https://doi.org/10.2139/ssrn.4332703
- Frank Benford. 1938. The Law of Anomalous Numbers. Proceedings of the American Philosophical Society 78, 4 (1938), 551–572. https://www.jstor.org/stable/984802 Publisher: American Philosophical Society.
- Defining user spectra to classify Ethereum users based on their behavior. Journal of Big Data 9, 1 (April 2022), 37. https://doi.org/10.1186/s40537-022-00586-3
- Execution and Statistical Arbitrage with Signals in Multiple Automated Market Makers. https://doi.org/10.2139/ssrn.4388104
- Ready, Aim, Snipe! Analysis of Sniper Bots and their Impact on the DeFi Ecosystem. In Companion Proceedings of the ACM Web Conference 2023. ACM, Austin TX USA, 1093–1102. https://doi.org/10.1145/3543873.3587612
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785
- Exploiting Blockchain Data to Detect Smart Ponzi Schemes on Ethereum. IEEE Access 7 (2019), 37575–37586. https://doi.org/10.1109/ACCESS.2019.2905769 Conference Name: IEEE Access.
- Crypto Wash Trading. https://doi.org/10.2139/ssrn.3530220 Available at: https://ssrn.com/abstract=4529817.
- Flash Boys 2.0: Frontrunning in Decentralized Exchanges, Miner Extractable Value, and Consensus Instability. In 2020 IEEE Symposium on Security and Privacy (SP). 910–927. https://doi.org/10.1109/SP40000.2020.00040 ISSN: 2375-1207.
- Efficient Clustering of Very Large Document Collections. In Data Mining for Scientific and Engineering Applications, Robert L. Grossman, Chandrika Kamath, Philip Kegelmeyer, Vipin Kumar, and Raju R. Namburu (Eds.). Vol. 2. Springer US, Boston, MA, 357–381. https://doi.org/10.1007/978-1-4615-1733-7_20 Series Title: Massive Computing.
- SoK: Transparent Dishonesty: Front-Running Attacks on Blockchain. In Financial Cryptography and Data Security (Lecture Notes in Computer Science), Andrea Bracciali, Jeremy Clark, Federico Pintore, Peter B. Rønne, and Massimiliano Sala (Eds.). Springer International Publishing, Cham, 170–189. https://doi.org/10.1007/978-3-030-43725-1_13
- Flashbots. 2023. Flashbots Transparency Dashboard. https://transparency.flashbots.net/
- Letterio Galletta and Fabio Pinelli. 2023. Sharpening Ponzi Schemes Detection on Ethereum with Machine Learning. http://arxiv.org/abs/2301.04872 arXiv:2301.04872 [cs].
- GraphSense: A General-Purpose Cryptoasset Analytics Platform. http://arxiv.org/abs/2102.13613 arXiv:2102.13613 [cs].
- Georgios Konstantopoulos. 2022. Symbolic MEV Extraction. https://www.youtube.com/watch?v=VkSR9jz_C-0
- Alfred Lehar and Christine A. Parlour. 2021. Decentralized Exchanges. https://doi.org/10.2139/ssrn.3905316
- Guozhu Dong Liu, Huan (Ed.). 2018. Feature Engineering for Machine Learning and Data Analytics. CRC Press, Boca Raton. https://doi.org/10.1201/9781315181080
- Scott Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. https://doi.org/10.48550/arXiv.1705.07874 arXiv:1705.07874 [cs, stat].
- The Determination of Cluster Number at k-Mean Using Elbow Method and Purity Evaluation on Headline News. In 2018 International Seminar on Application for Technology of Information and Communication. 533–538. https://doi.org/10.1109/ISEMANTIC.2018.8549751
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. https://doi.org/10.48550/arXiv.1802.03426 arXiv:1802.03426 [cs, stat].
- Vijay Mohan. 2022. Automated market makers and decentralized exchanges: a DeFi primer. Financial Innovation 8, 1 (Feb. 2022), 20. https://doi.org/10.1186/s40854-021-00314-5
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html
- Extracting Godl [sic] from the Salt Mines: Ethereum Miners Extracting Value. https://doi.org/10.48550/arXiv.2203.15930 arXiv:2203.15930 [cs].
- An Empirical Study of DeFi Liquidations: Incentives, Risks, and Instabilities. In Proceedings of the 21st ACM Internet Measurement Conference. 336–350. https://doi.org/10.1145/3487552.3487811 arXiv:2106.06389 [cs, q-fin].
- Quantifying Blockchain Extractable Value: How dark is the forest? https://doi.org/10.48550/arXiv.2101.05511 arXiv:2101.05511 [cs].
- A Deep Learning Model for Threat Hunting in Ethereum Blockchain. In 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 1185–1190. https://doi.org/10.1109/TrustCom53373.2021.00160 ISSN: 2324-9013.
- Dan Robinson and Georgios Konstantopoulos. 2020. Ethereum is a Dark Forest. https://www.paradigm.xyz/2020/08/ethereum-is-a-dark-forest
- Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (Nov. 1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
- Assessing the Solvency of Virtual Asset Service Providers: Are Current Standards Sufficient? https://doi.org/10.2139/ssrn.4586682
- Anomaly Detection Model Over Blockchain Electronic Transactions. In 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC). 895–900. https://doi.org/10.1109/IWCMC.2019.8766765 ISSN: 2376-6506.
- Gideon Schwarz. 1978. Estimating the Dimension of a Model. The Annals of Statistics 6, 2 (1978), 461–464. https://www.jstor.org/stable/2958889 Publisher: Institute of Mathematical Statistics.
- C. E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (July 1948), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x Conference Name: The Bell System Technical Journal.
- Corwin Smith. 2023. Maximal extractable value (MEV). https://ethereum.org Available at: https://ethereum.org/developers/docs/mev.
- Blockchain Scaling Using Rollups: A Comprehensive Survey. IEEE Access 10 (2022), 93039–93054. https://doi.org/10.1109/ACCESS.2022.3200051 Conference Name: IEEE Access.
- Frontrunner Jones and the Raiders of the Dark Forest: An Empirical Study of Frontrunning on the Ethereum Blockchain. ArXiv (Feb. 2021). https://www.semanticscholar.org/paper/Frontrunner-Jones-and-the-Raiders-of-the-Dark-An-of-Torres-Camino/189c624e936060f5c106c7247ac5e87a75becdb8
- Fabian Vogelsteller and Vitalik Buterin. [n. d.]. ERC-20: Token Standard. https://eips.ethereum.org/EIPS/eip-20
- Non-Fungible Token (NFT): Overview, Evaluation, Opportunities and Challenges. https://doi.org/10.48550/arXiv.2105.07447 arXiv:2105.07447 [cs].
- Cyclic Arbitrage in Decentralized Exchanges. In Companion Proceedings of the Web Conference 2022 (WWW ’22). Association for Computing Machinery, New York, NY, USA, 12–19. https://doi.org/10.1145/3487553.3524201
- Impact and User Perception of Sandwich Attacks in the DeFi Ecosystem. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–15. https://doi.org/10.1145/3491102.3517585
- SoK: Decentralized Finance (DeFi). In Proceedings of the 4th ACM Conference on Advances in Financial Technologies (AFT ’22). Association for Computing Machinery, New York, NY, USA, 30–46. https://doi.org/10.1145/3558535.3559780
- R. Wirth and Jochen Hipp. 2000. CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining (Jan. 2000).
- When are Deep Networks really better than Decision Forests at small sample sizes, and how? http://arxiv.org/abs/2108.13637 arXiv:2108.13637 [cs, q-bio, stat].
- Bill Zhang and Amy Chou. 2023. chi-research/symbolic-searcher. https://github.com/chi-research/symbolic-searcher original-date: 2022-09-10T16:50:58Z.
- Xiaojin (Jerry) Zhu. 2005. Semi-Supervised Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences. https://minds.wisconsin.edu/handle/1793/60444 Accepted: 2012-03-15T17:19:12Z.
- Detecting Bot Activity in the Ethereum Blockchain Network. https://doi.org/10.48550/arXiv.1810.01591 arXiv:1810.01591 [cs].