Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Survey of Malware Analysis through Control Flow Graph using Machine Learning (2305.08993v2)

Published 15 May 2023 in cs.CR and cs.LG

Abstract: Malware is a significant threat to the security of computer systems and networks which requires sophisticated techniques to analyze the behavior and functionality for detection. Traditional signature-based malware detection methods have become ineffective in detecting new and unknown malware due to their rapid evolution. One of the most promising techniques that can overcome the limitations of signature-based detection is to use control flow graphs (CFGs). CFGs leverage the structural information of a program to represent the possible paths of execution as a graph, where nodes represent instructions and edges represent control flow dependencies. Machine learning (ML) algorithms are being used to extract these features from CFGs and classify them as malicious or benign. In this survey, we aim to review some state-of-the-art methods for malware detection through CFGs using ML, focusing on the different ways of extracting, representing, and classifying. Specifically, we present a comprehensive overview of different types of CFG features that have been used as well as different ML algorithms that have been applied to CFG-based malware detection. We provide an in-depth analysis of the challenges and limitations of these approaches, as well as suggest potential solutions to address some open problems and promising future directions for research in this field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Barbara Kitchenham. Procedures for performing systematic reviews. Keele, UK, Keele University, 33(2004):1–26, 2004.
  2. Available [Online] Yahoo News. accessed: 2023-04-20. https://money.yahoo.com/android-dominates-globally-but-apples-gaining-ground-102346830.html.
  3. A combination method for android malware detection based on control flow graphs and machine learning algorithms. IEEE access, 7:21235–21245, 2019.
  4. Android malware analysis approach based on control flow graphs and machine learning algorithms. In 2016 4th International Symposium on Digital Forensic and Security (ISDFS), pages 26–31. IEEE, 2016.
  5. Dissecting android malware: Characterization and evolution. In 2012 IEEE symposium on security and privacy, pages 95–109. IEEE, 2012.
  6. Cdgdroid: Android malware detection based on deep learning using cfg and dfg. In Formal Methods and Software Engineering: 20th International Conference on Formal Engineering Methods, ICFEM 2018, Gold Coast, QLD, Australia, November 12-16, 2018, Proceedings 20, pages 177–193. Springer, 2018.
  7. Marvin: Efficient and comprehensive mobile app classification through static and dynamic analysis. In 2015 IEEE 39th annual computer software and applications conference, volume 2, pages 422–433. IEEE, 2015.
  8. Drebin: Effective and explainable detection of android malware in your pocket. In Ndss, volume 14, pages 23–26, 2014.
  9. Available [Online] VirusShare. accessed: 2023-04-20. https://virusshare.com/.
  10. Available [Online] Contagio. accessed: 2023-04-20. https://contagiodump.blogspot.com/.
  11. Androzoo: Collecting millions of android apps for the research community. In Proceedings of the 13th international conference on mining software repositories, pages 468–471, 2016.
  12. Android malware clustering through malicious payload mining. In Research in Attacks, Intrusions, and Defenses: 20th International Symposium, RAID 2017, Atlanta, GA, USA, September 18–20, 2017, Proceedings, pages 192–214. Springer, 2017.
  13. Deep ground truth analysis of current android malware. In Detection of Intrusions and Malware, and Vulnerability Assessment: 14th International Conference, DIMVA 2017, Bonn, Germany, July 6-7, 2017, Proceedings 14, pages 252–276. Springer, 2017.
  14. Auto-detection of sophisticated malware using lazy-binding control flow graph and deep learning. Computers & Security, 76:128–155, 2018.
  15. Available [Online] AV-test. accessed: 2023-04-04. https://www.av-test.org/en/statistics/malware/.
  16. A hybrid approach for control flow graph construction from binary code. In 2013 20th Asia-Pacific Software Engineering Conference (APSEC), volume 2, pages 159–164. IEEE, 2013.
  17. Available [Online] VXHeavens. accessed: 2023-04-20. https://archive.org/download/vxheavens-2010-05-18.
  18. The malicia dataset: identification and analysis of drive-by download operations. International Journal of Information Security, 14:15–33, 2015.
  19. Available [Online] YouNet. accessed: 2023-04-20. https://www.younetgroup.com/.
  20. Classifying malware represented as control flow graphs using deep graph convolutional neural network. In 2019 49th annual IEEE/IFIP international conference on dependable systems and networks (DSN), pages 52–63. IEEE, 2019.
  21. hex rays. Ida pro [online]. https://hex-rays.com/ida-pro/.
  22. Microsoft malware classification challenge. arXiv preprint arXiv:1802.10135, 2018.
  23. Available [Online] offensivecomputing. accessed: 2023-04-20. http://www.offensivecomputing.net.
  24. Available [Online] VirusTotal. accessed: 2023-04-20. https://https://www.virustotal.com/.
  25. Soteria: Detecting adversarial examples in control flow graph-based malware classifiers. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), pages 888–898. IEEE, 2020.
  26. Available [Online] freeioc. accessed: 2019. https://freeiocs.cyberiocs.pro/.
  27. Classifying packed malware represented as control flow graphs using deep graph convolutional neural network. In 2020 International Conference on Computer Engineering and Application (ICCEA), pages 254–258. IEEE, 2020.
  28. Malware classification by learning semantic and structural features of control flow graphs. In 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pages 540–547. IEEE, 2021.
  29. Cfgexplainer: Explaining graph neural network-based malware classification from control flow graphs. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 172–184. IEEE, 2022.
  30. Gnnexplainer: Generating explanations for graph neural networks. Advances in neural information processing systems, 32, 2019.
  31. On explainability of graph neural networks via subgraph explorations. In International Conference on Machine Learning, pages 12241–12252. PMLR, 2021.
  32. Parameterized explainer for graph neural network. Advances in neural information processing systems, 33:19620–19631, 2020.
  33. Leveraging spectral representations of control flow graphs for efficient analysis of windows malware. In Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, pages 1240–1242, 2022.
  34. Androguard [online]. https://github.com/androguard/androguard.
  35. The cart decision tree for mining data streams. Information Sciences, 266:1–15, 2014.
  36. Apktool: A tool for reverse engineering android apk files [online]. https://ibotpeaches.github.io/Apktool/.
  37. Mi. Mi app store [online]. https://www.dev.mi.com/en.
  38. Steven Arzt and Open source team. Flowdroid data flow analysis tool [online]. https://github.com/secure-software-engineering/FlowDroid.
  39. Radare2. Radare2 [online]. https://rada.re/n/radare2.html.
  40. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  41. Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, pages 5453–5462. PMLR, 2018.
  42. Ghidra. Ghidra [online]. https://ghidra-sre.org/.
  43. When malware is packin’heat; limits of machine learning classifiers based on static analysis features. In Network and Distributed Systems Security (NDSS) Symposium 2020, 2020.
  44. Netlsd: hearing the shape of a graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2347–2356, 2018.
  45. A survey on malware detection with graph representation learning. arXiv preprint arXiv:2303.16004, 2023.
  46. Combating fake cyber threat intelligence using provenance in cybersecurity knowledge graphs. In 2021 IEEE International Conference on Big Data (Big Data), pages 3316–3323. IEEE, 2021.
  47. Semantics-preserving reinforcement learning attack against graph neural networks for malware detection. IEEE Transactions on Dependable and Secure Computing, 20(2):1390–1402, 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.