Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Detecting Financial Bots on the Ethereum Blockchain (2403.19530v1)

Published 28 Mar 2024 in cs.CR and cs.LG

Abstract: The integration of bots in Distributed Ledger Technologies (DLTs) fosters efficiency and automation. However, their use is also associated with predatory trading and market manipulation, and can pose threats to system integrity. It is therefore essential to understand the extent of bot deployment in DLTs; despite this, current detection systems are predominantly rule-based and lack flexibility. In this study, we present a novel approach that utilizes machine learning for the detection of financial bots on the Ethereum platform. First, we systematize existing scientific literature and collect anecdotal evidence to establish a taxonomy for financial bots, comprising 7 categories and 24 subcategories. Next, we create a ground-truth dataset consisting of 133 human and 137 bot addresses. Third, we employ both unsupervised and supervised machine learning algorithms to detect bots deployed on Ethereum. The highest-performing clustering algorithm is a Gaussian Mixture Model with an average cluster purity of 82.6%, while the highest-performing model for binary classification is a Random Forest with an accuracy of 83%. Our machine learning-based detection mechanism contributes to understanding the Ethereum ecosystem dynamics by providing additional insights into the current bot landscape.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. 0age. 2019. On Efficient Ethereum Addresses. https://medium.com/coinmonks/on-efficient-ethereum-addresses-3fef0596e263
  2. Miners as intermediaries: extractable value and market manipulation in crypto and DeFi. (2022).
  3. The technology of decentralized finance (DeFi). Digital Finance (Aug. 2023). https://doi.org/10.1007/s42521-023-00088-8
  4. A Model for Detecting Cryptocurrency Transactions with Discernible Purpose. In 2019 Eleventh International Conference on Ubiquitous and Future Networks (ICUFN). 713–717. https://doi.org/10.1109/ICUFN.2019.8806126 ISSN: 2165-8536.
  5. Mikolaj Barczentewicz. 2023. MEV on Ethereum: A Policy Analysis. https://doi.org/10.2139/ssrn.4332703
  6. Frank Benford. 1938. The Law of Anomalous Numbers. Proceedings of the American Philosophical Society 78, 4 (1938), 551–572. https://www.jstor.org/stable/984802 Publisher: American Philosophical Society.
  7. Defining user spectra to classify Ethereum users based on their behavior. Journal of Big Data 9, 1 (April 2022), 37. https://doi.org/10.1186/s40537-022-00586-3
  8. Execution and Statistical Arbitrage with Signals in Multiple Automated Market Makers. https://doi.org/10.2139/ssrn.4388104
  9. Ready, Aim, Snipe! Analysis of Sniper Bots and their Impact on the DeFi Ecosystem. In Companion Proceedings of the ACM Web Conference 2023. ACM, Austin TX USA, 1093–1102. https://doi.org/10.1145/3543873.3587612
  10. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785
  11. Exploiting Blockchain Data to Detect Smart Ponzi Schemes on Ethereum. IEEE Access 7 (2019), 37575–37586. https://doi.org/10.1109/ACCESS.2019.2905769 Conference Name: IEEE Access.
  12. Crypto Wash Trading. https://doi.org/10.2139/ssrn.3530220 Available at: https://ssrn.com/abstract=4529817.
  13. Flash Boys 2.0: Frontrunning in Decentralized Exchanges, Miner Extractable Value, and Consensus Instability. In 2020 IEEE Symposium on Security and Privacy (SP). 910–927. https://doi.org/10.1109/SP40000.2020.00040 ISSN: 2375-1207.
  14. Efficient Clustering of Very Large Document Collections. In Data Mining for Scientific and Engineering Applications, Robert L. Grossman, Chandrika Kamath, Philip Kegelmeyer, Vipin Kumar, and Raju R. Namburu (Eds.). Vol. 2. Springer US, Boston, MA, 357–381. https://doi.org/10.1007/978-1-4615-1733-7_20 Series Title: Massive Computing.
  15. SoK: Transparent Dishonesty: Front-Running Attacks on Blockchain. In Financial Cryptography and Data Security (Lecture Notes in Computer Science), Andrea Bracciali, Jeremy Clark, Federico Pintore, Peter B. Rønne, and Massimiliano Sala (Eds.). Springer International Publishing, Cham, 170–189. https://doi.org/10.1007/978-3-030-43725-1_13
  16. Flashbots. 2023. Flashbots Transparency Dashboard. https://transparency.flashbots.net/
  17. Letterio Galletta and Fabio Pinelli. 2023. Sharpening Ponzi Schemes Detection on Ethereum with Machine Learning. http://arxiv.org/abs/2301.04872 arXiv:2301.04872 [cs].
  18. GraphSense: A General-Purpose Cryptoasset Analytics Platform. http://arxiv.org/abs/2102.13613 arXiv:2102.13613 [cs].
  19. Georgios Konstantopoulos. 2022. Symbolic MEV Extraction. https://www.youtube.com/watch?v=VkSR9jz_C-0
  20. Alfred Lehar and Christine A. Parlour. 2021. Decentralized Exchanges. https://doi.org/10.2139/ssrn.3905316
  21. Guozhu Dong Liu, Huan (Ed.). 2018. Feature Engineering for Machine Learning and Data Analytics. CRC Press, Boca Raton. https://doi.org/10.1201/9781315181080
  22. Scott Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. https://doi.org/10.48550/arXiv.1705.07874 arXiv:1705.07874 [cs, stat].
  23. The Determination of Cluster Number at k-Mean Using Elbow Method and Purity Evaluation on Headline News. In 2018 International Seminar on Application for Technology of Information and Communication. 533–538. https://doi.org/10.1109/ISEMANTIC.2018.8549751
  24. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. https://doi.org/10.48550/arXiv.1802.03426 arXiv:1802.03426 [cs, stat].
  25. Vijay Mohan. 2022. Automated market makers and decentralized exchanges: a DeFi primer. Financial Innovation 8, 1 (Feb. 2022), 20. https://doi.org/10.1186/s40854-021-00314-5
  26. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html
  27. Extracting Godl [sic] from the Salt Mines: Ethereum Miners Extracting Value. https://doi.org/10.48550/arXiv.2203.15930 arXiv:2203.15930 [cs].
  28. An Empirical Study of DeFi Liquidations: Incentives, Risks, and Instabilities. In Proceedings of the 21st ACM Internet Measurement Conference. 336–350. https://doi.org/10.1145/3487552.3487811 arXiv:2106.06389 [cs, q-fin].
  29. Quantifying Blockchain Extractable Value: How dark is the forest? https://doi.org/10.48550/arXiv.2101.05511 arXiv:2101.05511 [cs].
  30. A Deep Learning Model for Threat Hunting in Ethereum Blockchain. In 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 1185–1190. https://doi.org/10.1109/TrustCom53373.2021.00160 ISSN: 2324-9013.
  31. Dan Robinson and Georgios Konstantopoulos. 2020. Ethereum is a Dark Forest. https://www.paradigm.xyz/2020/08/ethereum-is-a-dark-forest
  32. Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (Nov. 1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
  33. Assessing the Solvency of Virtual Asset Service Providers: Are Current Standards Sufficient? https://doi.org/10.2139/ssrn.4586682
  34. Anomaly Detection Model Over Blockchain Electronic Transactions. In 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC). 895–900. https://doi.org/10.1109/IWCMC.2019.8766765 ISSN: 2376-6506.
  35. Gideon Schwarz. 1978. Estimating the Dimension of a Model. The Annals of Statistics 6, 2 (1978), 461–464. https://www.jstor.org/stable/2958889 Publisher: Institute of Mathematical Statistics.
  36. C. E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (July 1948), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x Conference Name: The Bell System Technical Journal.
  37. Corwin Smith. 2023. Maximal extractable value (MEV). https://ethereum.org Available at: https://ethereum.org/developers/docs/mev.
  38. Blockchain Scaling Using Rollups: A Comprehensive Survey. IEEE Access 10 (2022), 93039–93054. https://doi.org/10.1109/ACCESS.2022.3200051 Conference Name: IEEE Access.
  39. Frontrunner Jones and the Raiders of the Dark Forest: An Empirical Study of Frontrunning on the Ethereum Blockchain. ArXiv (Feb. 2021). https://www.semanticscholar.org/paper/Frontrunner-Jones-and-the-Raiders-of-the-Dark-An-of-Torres-Camino/189c624e936060f5c106c7247ac5e87a75becdb8
  40. Fabian Vogelsteller and Vitalik Buterin. [n. d.]. ERC-20: Token Standard. https://eips.ethereum.org/EIPS/eip-20
  41. Non-Fungible Token (NFT): Overview, Evaluation, Opportunities and Challenges. https://doi.org/10.48550/arXiv.2105.07447 arXiv:2105.07447 [cs].
  42. Cyclic Arbitrage in Decentralized Exchanges. In Companion Proceedings of the Web Conference 2022 (WWW ’22). Association for Computing Machinery, New York, NY, USA, 12–19. https://doi.org/10.1145/3487553.3524201
  43. Impact and User Perception of Sandwich Attacks in the DeFi Ecosystem. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–15. https://doi.org/10.1145/3491102.3517585
  44. SoK: Decentralized Finance (DeFi). In Proceedings of the 4th ACM Conference on Advances in Financial Technologies (AFT ’22). Association for Computing Machinery, New York, NY, USA, 30–46. https://doi.org/10.1145/3558535.3559780
  45. R. Wirth and Jochen Hipp. 2000. CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining (Jan. 2000).
  46. When are Deep Networks really better than Decision Forests at small sample sizes, and how? http://arxiv.org/abs/2108.13637 arXiv:2108.13637 [cs, q-bio, stat].
  47. Bill Zhang and Amy Chou. 2023. chi-research/symbolic-searcher. https://github.com/chi-research/symbolic-searcher original-date: 2022-09-10T16:50:58Z.
  48. Xiaojin (Jerry) Zhu. 2005. Semi-Supervised Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences. https://minds.wisconsin.edu/handle/1793/60444 Accepted: 2012-03-15T17:19:12Z.
  49. Detecting Bot Activity in the Ethereum Blockchain Network. https://doi.org/10.48550/arXiv.1810.01591 arXiv:1810.01591 [cs].
Citations (1)

Summary

We haven't generated a summary for this paper yet.