Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SourceP: Detecting Ponzi Schemes on Ethereum with Source Code (2306.01665v8)

Published 2 Jun 2023 in cs.SE and cs.AI

Abstract: As blockchain technology becomes more and more popular, a typical financial scam, the Ponzi scheme, has also emerged in the blockchain platform Ethereum. This Ponzi scheme deployed through smart contracts, also known as the smart Ponzi scheme, has caused a lot of economic losses and negative impacts. Existing methods for detecting smart Ponzi schemes on Ethereum mainly rely on bytecode features, opcode features, account features, and transaction behavior features of smart contracts, which are unable to truly characterize the behavioral features of Ponzi schemes, and thus generally perform poorly in terms of detection accuracy and false alarm rates. In this paper, we propose SourceP, a method to detect smart Ponzi schemes on the Ethereum platform using pre-trained models and data flow, which only requires using the source code of smart contracts as features. SourceP reduces the difficulty of data acquisition and feature extraction of existing detection methods. Specifically, we first convert the source code of a smart contract into a data flow graph and then introduce a pre-trained model based on learning code representations to build a classification model to identify Ponzi schemes in smart contracts. The experimental results show that SourceP achieves 87.2% recall and 90.7% F-score for detecting smart Ponzi schemes within Ethereum's smart contract dataset, outperforming state-of-the-art methods in terms of performance and sustainability. We also demonstrate through additional experiments that pre-trained models and data flow play an important contribution to SourceP, as well as proving that SourceP has a good generalization ability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (97)
  1. S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” Decentralized business review, p. 21260, 2008.
  2. W. Jiajing, Y. Qi, L. Dan, Y. Wei, C. Weili, C. Chuan, and Z. Zibin, “Who are the phishers? phishing scam detection on ethereum via network embedding,” in IEEE Transactions on Systems, Man, and Cybernetics, vol. 52, 2022, pp. 1156–1166.
  3. E. Androulaki, A. Barger, V. Bortnikov, C. Cachin, K. Christidis, A. De Caro, D. Enyeart, C. Ferris, G. Laventman, Y. Manevich et al., “Hyperledger fabric: a distributed operating system for permissioned blockchains,” in Proceedings of the thirteenth EuroSys conference, 2018, pp. 1–15.
  4. Z. Zheng, W. Chen, Z. Zhong, Z. Chen, and Y. Lu, “Securing the ethereum from smart ponzi schemes: Identification using static features,” in ACM Transactions on Software Engineering and Methodology 32.5 (2023): 1-28., 2022.
  5. K. Christidis and M. Devetsikiotis, “Blockchains and smart contracts for the internet of things,” Ieee Access, vol. 4, pp. 2292–2303, 2016.
  6. J. Huang, D. He, M. S. Obaidat, P. Vijayakumar, M. Luo, and K.-K. R. Choo, “The application of the blockchain technology in voting systems: A review,” ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1–28, 2021.
  7. S. B. Patel, H. A. Kheruwala, M. Alazab, N. Patel, R. Damani, P. Bhattacharya, S. Tanwar, and N. Kumar, “Biouav: Blockchain-envisioned framework for digital identification to secure access in next-generation uavs,” in Proceedings of the 2nd ACM MobiCom Workshop on Drone Assisted Wireless Communications for 5G and Beyond, 2020, pp. 43–48.
  8. Y. Wang, Z. Su, Q. Xu, R. Li, and T. H. Luan, “Lifesaving with rescuechain: Energy-efficient and partition-tolerant blockchain based secure information sharing for uav-aided disaster rescue,” in IEEE INFOCOM 2021-IEEE Conference on Computer Communications.   IEEE, 2021, pp. 1–10.
  9. T. McGhin, K.-K. R. Choo, C. Z. Liu, and D. He, “Blockchain in healthcare applications: Research challenges and opportunities,” Journal of network and computer applications, vol. 135, pp. 62–75, 2019.
  10. C. Xu, K. Wang, P. Li, S. Guo, J. Luo, B. Ye, and M. Guo, “Making big data open in edges: A resource-efficient blockchain-based approach,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 4, pp. 870–882, 2018.
  11. R. Belchior, A. Vasconcelos, S. Guerreiro, and M. Correia, “A survey on blockchain interoperability: Past, present, and future trends,” ACM Computing Surveys (CSUR), vol. 54, no. 8, pp. 1–41, 2021.
  12. K. Qin, L. Zhou, and A. Gervais, “Quantifying blockchain extractable value: How dark is the forest?” in 2022 IEEE Symposium on Security and Privacy (SP).   IEEE, 2022, pp. 198–214.
  13. “Ethereum,” https://www.ethereum.org/.
  14. S. Wang, L. Ouyang, Y. Yuan, X. Ni, X. Han, and F.-Y. Wang, “Blockchain-enabled smart contracts: Architecture, applications, and future trends,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 49, pp. 1–12, 2019.
  15. N. Szabo, “The idea of smart contracts,” Nick Szabo’s papers and concise tutorials, vol. 6, no. 1, p. 199, 1997.
  16. Z. Zheng, S. Xie, H.-N. Dai, W. Chen, X. Chen, J. Weng, and M. Imran, “An overview on smart contracts: Challenges, advances and platforms,” Future Generation Computer Systems, vol. 105, pp. 475–491, 2020.
  17. X. Liu, Z. Tang, P. Li, S. Guo, X. Fan, and J. Zhang, “A graph learning based approach for identity inference in dapp platform blockchain,” IEEE Transactions on Emerging Topics in Computing, vol. 10, no. 1, pp. 438–449, 2020.
  18. W. Cai, Z. Wang, J. B. Ernst, Z. Hong, C. Feng, and V. C. Leung, “Decentralized applications: The blockchain-empowered software system,” IEEE access, vol. 6, pp. 53 019–53 033, 2018.
  19. “Cryptokitties,” https://www.cryptokitties.co.
  20. “Idex,” https://idex.market/eth/aura.
  21. W. Chen, X. Li, Y. Sui, N. He, H. Wang, L. Wu, and X. Luo, “Sadponzi: Detecting and characterizing ponzi schemes in ethereum smart contracts,” in Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2021, pp. 35–36.
  22. M. Artzrouni, “The mathematics of ponzi schemes,” Mathematical Social Sciences, vol. 58, no. 2, pp. 190–201, 2009.
  23. M. Bartoletti, S. Carta, T. Cimoli, and R. Saia, “Dissecting ponzi schemes on ethereum: Identification, analysis, and impact,” Future Generation Computer Systems, vol. 102, pp. 259–277, 2020.
  24. “Ponzitracker,” https://www.ponzitracker.com/2022-ponzi-schemes.
  25. “Sec charges eleven individuals in $300 million crypto pyramid scheme,” https://www.sec.gov/news/press-release/2022-134.
  26. S. Fan, S. Fu, H. Xu, and X. Cheng, “Al-spsd: Anti-leakage smart ponzi schemes detection in blockchain,” Information Processing & Management, vol. 58, pp. 102 587–, 2021.
  27. E. Jung, M. L. Tilly, A. Gehani, and Y. Ge, “Data mining-based ethereum fraud detection,” 2019 IEEE international conference on blockchain, pp. 266–273, 2019.
  28. X. He, T. Yang, and L. Chen, “Ctrf: Ethereum-based ponzi contract identification,” Security and Communication Networks, vol. 2022, 2022.
  29. W. Chen, Z. Zheng, J. Cui, E. Ngai, P. Zheng, and Y. Zhou, “Detecting ponzi schemes on ethereum: Towards healthier blockchain technology.” Proceedings of the 2018 world wide web conference, pp. 1409–1418, 2018.
  30. W. Chen, Z. Zheng, E. C.-H. Ngai, P. Zheng, and Y. Zhou, “Exploiting blockchain data to detect smart ponzi schemes on ethereum,” IEEE Access, vol. 7, pp. 37 575–37 586, 2019.
  31. OpenAI, “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  32. C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  33. K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, “Electra: Pre-training text encoders as discriminators rather than generators,” arXiv preprint arXiv:2003.10555, 2020.
  34. S. Tikhomirov, E. Voskresenskaya, I. Ivanitskiy, R. Takhaviev, E. Marchenko, and Y. Alexandrov, “Smartcheck: Static analysis of ethereum smart contracts.” 2018, pp. 9–16.
  35. “Solidity,” http://solidity.readthedocs.io/en/latest.
  36. N. Szabo, “Smart contracts: building blocks for digital markets,” EXTROPY: The Journal of Transhumanist Thought,(16), vol. 18, no. 2, p. 28, 1996.
  37. V. Buterin et al., “A next-generation smart contract and decentralized application platform,” white paper, vol. 3, no. 37, pp. 2–1, 2014.
  38. “Serpent,” https://github.com/ethereum/wiki/wiki/Serpent.
  39. M. Vasek and T. Moore, “Analyzing the bitcoin ponzi scheme ecosystem.” Springer Berlin Heidelberg, pp. 101–112, 2019.
  40. “Chainalysis,” https://go.chainalysis.com/2023-Crypto-Crime-Report.html.
  41. “Onecoin,” https://en.wikipedia.org/wiki/OneCoin.
  42. A. Roan, “Ethereum smart contract ponzi schemes,” https://medium.com/blockcentric/ethereum-smart-contract-ponzi-schemes-9e43015b56f8.
  43. ——, “Ethereum smart contract ponzi schemes: Part 3,” https://medium.com/blockcentric/ethereum-smart-contract-ponzi-schemes-part-3-c99e9d608c9b.
  44. D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, J. Yin, D. Jiang, and M. Zhou, “Graphcodebert: Pre-training code representations with data flow,” arXiv preprint arXiv:2009.08366, 2021.
  45. M. Allamanis, M. Brockschmidt, and M. Khademi, “Learning to represent programs with graphs,” arXiv preprint arXiv:1711.00740, 2017.
  46. D. Liu, S. Yin, G. Luo, J. Shang, L. Liu, S. Wei, Y. Feng, and S. Zhou, “Data-flow graph mapping optimization for cgra with deep reinforcement learning,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 12, pp. 2271–2283, 2018.
  47. V. J. Hellendoorn, C. Sutton, R. Singh, P. Maniatis, and D. Bieber, “Global relational models of source code,” in International conference on learning representations, 2019.
  48. X. Han, Z. Zhang, N. Ding, Y. Gu, X. Liu, Y. Huo, J. Qiu, L. Zhang, W. Han, M. Huang, Q. Jin, Y. Lan, Y. Liu, Z. Liu, Z. Lu, X. Qiu, R. Song, J. Tang, J.-R. Wen, J. Yuan, W. X. Zhao, and J. Zhu, “Pre-trained models: Past, present and future,” AI Open, 2021.
  49. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  50. M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. S. Zettlemoyer, “Deep contextualized word representations.” vol. abs/1802.05365, 2018.
  51. A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training,” 2018.
  52. Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, Q. V. Le, and R. R. Salakhutdinov, “Xlnet: Generalized autoregressive pretraining for language understanding.” vol. 32, pp. 5754–5764, 2019.
  53. F. Zhangyin, G. Daya, T. Duyu, D. Nan, F. Xiaocheng, G. Ming, S. Linjun, Q. Bing, L. Ting, J. Daxin, and Z. Ming, “Codebert: A pre-trained model for programming and natural languages,” arXiv preprint arXiv:2002.08155, vol. 2020.findings-emnlp, pp. 1536–1547, 2020.
  54. A. Kanade, P. Maniatis, G. Balakrishnan, and K. Shi, “Learning and evaluating contextual embedding of source code,” International conference on machine learning, pp. 5110–5121, 2020.
  55. S. Alexey, D. S. Kun, F. Shengyu, and S. Neel, “Intellicode compose: Code generation using transformer,” Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1433–1443, 2020.
  56. S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang, G. Li, L. Zhou, L. Shou, L. Zhou, M. Tufano, M. Gong, M. Zhou, N. Duan, N. Sundaresan, S. K. Deng, S. Fu, and S. Liu, “Codexglue: A machine learning benchmark dataset for code understanding and generation.” arXiv preprint arXiv:2102.04664, 2021.
  57. H. Husain, H.-H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, “Codesearchnet challenge: Evaluating the state of semantic code search,” arXiv preprint arXiv:1909.09436, 2019.
  58. “tree-sitter,” https://tree-sitter.github.io/tree-sitter.
  59. “tree-sitter-solidity,” https://github.com/JoranHonig/tree-sitter-solidity.
  60. “tree-sitter-javascript,” https://github.com/tree-sitter/tree-sitter-javascript.
  61. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need.” Advances in neural information processing systems, vol. 30, pp. 5998–6008, 2017.
  62. H. Wu, Z. Zhang, S. Wang, Y. Lei, B. Lin, Y. Qin, H. Zhang, and X. Mao, “Peculiar: Smart contract vulnerability detection based on crucial data flow graph and pre-training techniques,” 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), pp. 378–389, 2021.
  63. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
  64. “Etherscan,” https://etherscan.io/.
  65. H. Cai, “Assessing and improving malware detection sustainability through app evolution studies,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 29, no. 2, pp. 1–28, 2020.
  66. F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, and L. Cavallaro, “Tesseract: Eliminating experimental bias in malware classification across space and time,” in 28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 729–746.
  67. A. Zhu, P. Fu, Q. Zhang, and Z. Chen, “Ponzi scheme diffusion in complex networks,” Physica A: Statistical Mechanics and its Applications, vol. 479, pp. 128–136, 2017.
  68. T. Moore, J. Han, and R. Clayton, “The postmodern ponzi scheme: Empirical analysis of high-yield investment programs.” Financial Cryptography and Data Security: 16th International Conference, FC 2012, pp. 41–56, 2012.
  69. K. Toyoda, T. Ohtsuki, and P. T. Mathiopoulos, “Identification of high yielding investment programs in bitcoin via transactions pattern analysis.” GLOBECOM 2017-2017 IEEE Global Communications Conference, pp. 1–6, 2017.
  70. L. O. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost - unbiased boosting with categorical features.” Advances in neural information processing systems, pp. 6639–6649, 2018.
  71. R. Liang, J. Chen, K. He, Y. Wu, G. Deng, R. Du, and C. Wu, “Ponziguard: Detecting ponzi schemes on ethereum with contract runtime behavior graph (crbg),” in 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE).   IEEE Computer Society, 2023, pp. 755–766.
  72. Y. Boshmaf, C. Elvitigala, H. Al Jawaheri, P. Wijesekera, and M. Al Sabah, “Investigating mmm ponzi scheme on bitcoin,” in Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, 2020, pp. 519–530.
  73. M. Vasek and T. Moore, “There’s no free lunch, even using bitcoin: Tracking the popularity and profits of virtual currency scams,” in Financial Cryptography and Data Security: 19th International Conference, FC 2015, San Juan, Puerto Rico, January 26-30, 2015, Revised Selected Papers 19.   Springer, 2015, pp. 44–61.
  74. M. Bartoletti, B. Pes, and S. Serusi, “Data mining for detecting bitcoin ponzi schemes,” in 2018 Crypto Valley Conference on Blockchain Technology (CVCBT).   IEEE, 2018, pp. 75–84.
  75. L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart contracts smarter.” IACR Cryptology ePrint Archive, vol. 2016, pp. 254–269, 2016.
  76. C. F. Torres, J. Schütte, and R. State, “Osiris: Hunting for integer bugs in ethereum smart contracts,” pp. 664–676, 2018.
  77. B. Mueller, “Smashing ethereum smart contracts for fun and real profit,” HITB SECCONF Amsterdam, vol. 9, p. 54, 2018.
  78. I. Nikolic, A. Kolluri, I. Sergey, P. Saxena, and A. Hobor, “Finding the greedy, prodigal, and suicidal contracts at scale,” vol. abs/1802.06038, pp. 653–663, 2018.
  79. M. Mossberg, F. Manzano, E. Hennenfent, A. Groce, G. Grieco, J. Feist, T. Brunson, and A. Dinaburg, “Manticore: a user-friendly symbolic execution framework for binaries and smart contracts,” pp. 1186–1189, 2019.
  80. P. Tsankov, A. Dan, D. Drachsler-Cohen, A. Gervais, F. Buenzli, and M. Vechev, “Securify: Practical security analysis of smart contracts,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018, pp. 67–82.
  81. S. Kalra, S. Goel, M. Dhawan, and S. Sharma, “Zeus: Analyzing safety of smart contracts.” in ISOC Network and Distributed System Security Symposium, 2018.
  82. J. Feist, G. Greico, and A. Groce, “Slither: a static analysis framework for smart contracts,” pp. 8–15, 2019.
  83. B. Jiang, Y. Liu, and W. K. Chan, “Contractfuzzer: Fuzzing smart contracts for vulnerability detection.” in Automated Software Engineering, vol. abs/1807.03932, 2018, pp. 259–269.
  84. C. Liu, H. Liu, Z. Cao, Z. Chen, B. Chen, and B. Roscoe, “Reguard: finding reentrancy bugs in smart contracts.” 2018, pp. 65–68.
  85. D. Thomas, F. J. F., A. Rui, and C. Pedro, “Empirical review of automated analysis tools on 47,587 ethereum smart contracts,” vol. abs/1910.10601, 2020, pp. 530–541.
  86. A. Pinna, S. Ibba, G. Baralla, R. Tonelli, and M. Marchesi, “A massive analysis of ethereum smart contracts empirical study and code metrics.” vol. 7, pp. 78 194.0–78 213.0, 2019.
  87. J. Chen, X. Xia, D. Lo, J. Grundy, D. X. Luo, and T. Chen, “Domain specific code smells in smart contracts.” vol. abs/1905.01467, 2019.
  88. A. Elvira, C. Jesús, G. Pablo, R.-D. Guillermo, and R. Albert, “Gasol: Gas analysis and optimization for ethereum smart contracts.” 2020, pp. 118–125.
  89. N. Grech, M. Kong, A. Jurisevic, L. Brent, B. Scholz, and Y. Smaragdakis, “Madmax: Surviving out-of-gas conditions inethereum smart contracts,” vol. 2, pp. 1–27, 2018.
  90. N. He, L. Wu, H. Wang, Y. Guo, and X. Jiang, “Characterizing code clones in the ethereum smart contract ecosystem.” pp. 654–675, 2020.
  91. H. Liu, Z. Yang, C. Liu, Y. Jiang, W. Zhao, and J. Sun, “Eclone: detect semantic clones in ethereum via symbolic transaction sketch.” in ESEC/SIGSOFT FSE, 2018, pp. 900–903.
  92. Q. Xipeng, S. Tianxiang, X. Yige, S. Yunfan, D. Ning, and H. Xuanjing, “Pre-trained models for natural language processing: A survey,” vol. 63.0, pp. 1872.0–1897.0, 2020.
  93. D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, and J. Yin, “Unixcoder: Unified cross-modal pre-training for code representation,” vol. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7212–7225, 2022.
  94. W. U. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, “Unified pre-training for program understanding and generation,” pp. 2655–2668, 2021.
  95. L. Mike, L. Yinhan, G. Naman, G. Marjan, M. Abdelrahman, L. Omer, S. Ves, and Z. Luke, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” pp. 7871–7880, 2020.
  96. Y. Wang, W. Wang, S. Joty, and S. C. H. Hoi, “Codet5 - identifier-aware unified pre-trained encoder-decoder models for code understanding and generation.” vol. 2021.emnlp-main, pp. 8696–8708, 2021.
  97. Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. D. Lago, T. Hubert, P. Choy, C. de Masson d’Autume, I. Babuschkin, X. Chen, P.-S. Huang, J. Welbl, S. Gowal, A. Cherepanov, J. Molloy, D. J. Mankowitz, E. S. Robson, P. Kohli, N. de Freitas, K. Kavukcuoglu, and O. Vinyals, “Competition-level code generation with alphacode.” vol. 378, pp. 1092–1097, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.