Feature graph construction with static features for malware detection
Abstract: Malware can greatly compromise the integrity and trustworthiness of information and is in a constant state of evolution. Existing feature fusion-based detection methods generally overlook the correlation between features. And mere concatenation of features will reduce the model's characterization ability, lead to low detection accuracy. Moreover, these methods are susceptible to concept drift and significant degradation of the model. To address those challenges, we introduce a feature graph-based malware detection method, MFGraph, to characterize applications by learning feature-to-feature relationships to achieve improved detection accuracy while mitigating the impact of concept drift. In MFGraph, we construct a feature graph using static features extracted from binary PE files, then apply a deep graph convolutional network to learn the representation of the feature graph. Finally, we employ the representation vectors obtained from the output of a three-layer perceptron to differentiate between benign and malicious software. We evaluated our method on the EMBER dataset, and the experimental results demonstrate that it achieves an AUC score of 0.98756 on the malware detection task, outperforming other baseline models. Furthermore, the AUC score of MFGraph decreases by only 5.884% in one year, indicating that it is the least affected by concept drift.
- Instinct D. 2020 Cyber Threat Landscape Report; 2021. Available from: https://www.ibm.com/downloads/cas/M1X3B7QG/.
- Security I. X-Force Threat Intelligence Index 2021; 2021. Available from: https://www.ibm.com/security/data-breach/threat-intelligence/.
- Bitshred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of the 18th ACM conference on Computer and communications security; 2011. p. 309-20.
- Detecting Android Malware Leveraging Text Semantics of Network Flows. IEEE Transactions on Information Forensics and Security. 2018;13(5):1096-109.
- Babaagba KO, Adesanya SO. A Study on the Effect of Feature Selection on Malware Analysis Using Machine Learning. In: Proceedings of the 2019 8th International Conference on Educational and Information Technology (ICEIT); 2019. p. 51-5.
- DroidRL: Feature selection for android malware detection with reinforcement learning. Computers & Security. 2023;128:103126.
- Sun G, Qian Q. Deep learning and visualization for identifying malware families. IEEE Transactions on Dependable and Secure Computing. 2018;18(1):283-95.
- Byte-level malware classification based on markov images and deep learning. Computers & Security. 2020;92:101740.
- Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering. 2018;31(12):2346-63.
- A Comprehensive Study on Learning-Based PE Malware Family Classification Methods. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2021; 2021. p. 1314–1325. Available from: https://doi.org/10.1145/3468264.3473925.
- Transcending transcend: Revisiting malware classification in the presence of concept drift. In: 2022 IEEE Symposium on Security and Privacy (SP). IEEE; 2022. p. 805-23.
- Brzezinski D, Stefanowski J. Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm. IEEE Transactions on Neural Networks and Learning Systems. 2014;25(1):81-94.
- Minku LL, Yao X. DDD: A New Ensemble Approach for Dealing with Concept Drift. IEEE Transactions on Knowledge and Data Engineering. 2012;24(4):619-33.
- MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version). ACM Trans Priv Secur. 2019 apr;22(2). Available from: https://doi.org/10.1145/3313391.
- The concept drift problem in Android malware detection and its solution. Security and Communication Networks. 2017;2017.
- Deng A, Hooi B. Graph neural network-based anomaly detection in multivariate time series. In: Proceedings of the AAAI Conference on Artificial Intelligence; 2021. p. 4027-35.
- Graph neural networks in recommender systems: a survey. ACM Computing Surveys. 2022;55(5):1-37.
- Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering. 2017;29(12):2724-43.
- Graph convolutional policy network for goal-directed molecular graph generation. Advances in neural information processing systems. 2018;31.
- HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2017. p. 1507–-1515.
- MalGraph: Hierarchical Graph Neural Networks for Robust Windows Malware Detection. In: IEEE INFOCOM 2022 - IEEE Conference on Computer Communications; 2022. p. 1998-2007.
- Classifying Malware Represented as Control Flow Graphs using Deep Graph Convolutional Neural Network. In: Proceedings of the 49th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN); 2019. p. 52-63.
- Leveraging Spectral Representations of Control Flow Graphs for Efficient Analysis of Windows Malware. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. ASIA CCS ’22. New York, NY, USA: Association for Computing Machinery; 2022. p. 1240–1242. Available from: https://doi.org/10.1145/3488932.3527294.
- Quarkslab. LIEF: library for instrumenting executable files; 2021. Available from: https://lief.quarkslab.com/.
- An end-to-end deep learning architecture for graph classification. In: Proceedings of the AAAI Conference on Artificial Intelligence; 2018. p. 4438-45.
- Intelligent mobile malware detection using permission requests and API calls. Future Generation Computer Systems. 2020;107:509-21. Available from: https://www.sciencedirect.com/science/article/pii/S0167739X19321223.
- Detecting malware evolution using support vector machines. Expert Systems with Applications. 2020;143:113022. Available from: https://www.sciencedirect.com/science/article/pii/S0957417419307390.
- Cruparamer: Learning on parameter-augmented api sequences for malware detection. IEEE Transactions on Information Forensics and Security. 2022;17:788-803.
- Image-Based malware classification using ensemble of CNN architectures (IMCEC). Computers & Security. 2020;92:101748. Available from: https://www.sciencedirect.com/science/article/pii/S016740482030033X.
- IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Computer Networks. 2020;171:107138. Available from: https://www.sciencedirect.com/science/article/pii/S1389128619304736.
- Gotcha - Sly Malware! Scorpion A Metagraph2vec Based Malware Detection System. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD); 2018. p. 253-62.
- HomDroid: Detecting Android Covert Malware by Social-Network Homophily Analysis. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis; 2021. p. 216-29.
- Structural analysis of binary executable headers for malware detection optimization. Journal of Computer Virology and Hacking Techniques. 2017;13(2):87-93.
- Dynamic malware detection and phylogeny analysis using process mining. International Journal of Information Security. 2019;18(3):257-84.
- David OE, Netanyahu NS. DeepSign: Deep learning for automatic malware signature generation and classification. In: Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN); 2015. p. 1-8.
- AndroSimilar: Robust Statistical Feature Signature for Android Malware Detection. In: Proceedings of the 6th International Conference on Security of Information and Networks (SINCONF); 2013. p. 152-9.
- Kirat D, Vigna G. MalGene: Automatic Extraction of Malware Analysis Evasion Signature. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS); 2015. p. 769-80.
- Malware detection by eating a whole exe. In: Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence; 2018. .
- Deep4MalDroid: A Deep Learning Framework for Android Malware Detection Based on Linux Kernel System Call Graphs. In: Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW); 2016. p. 104-11.
- An intelligent PE-malware detection system based on association mining. Journal in computer virology. 2008;4(4):323-34.
- A novel approach to detect malware based on API call sequence analysis. International Journal of Distributed Sensor Networks. 2015;11(6):659101.
- PektaÅŸ A, Acarman T. Malware classification based on API calls and behaviour analysis. IET Information Security. 2018;12(2):107-17.
- Rabadi D, Teo SG. Advanced Windows Methods on Malware Detection and Classification. In: Annual Computer Security Applications Conference; 2020. p. 54-68.
- Dynamic Malware Analysis with Feature Engineering and Feature Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 2020 04;34:1210-7.
- Lightweight behavioral malware detection for windows platforms. In: Proceedings of the 2017 12th International Conference on Malicious and Unwanted Software (MALWARE); 2017. p. 75-81.
- Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security; 2011. p. 1-7.
- HIT4Mal: Hybrid image transformation for malware classification. Transactions on Emerging Telecommunications Technologies. 2020;31(11):e3789.
- A novel framework for image-based malware detection with a deep neural network. Computers & Security. 2021;109:102400. Available from: https://www.sciencedirect.com/science/article/pii/S0167404821002248.
- Bakour K, Ünver HM. DeepVisDroid: android malware detection by hybridizing image-based features with deep learning techniques. Neural Computing and Applications. 2021;33:11499-516.
- Kumar S, et al. MCFT-CNN: Malware classification with fine-tune convolution neural networks using traditional and transfer learning in Internet of Things. Future Generation Computer Systems. 2021;125:334-51.
- IntDroid: Android Malware Detection Based on API Intimacy Analysis. ACM Trans Softw Eng Methodol. 2021 May;30(3).
- Saxe J, Berlin K. Deep neural network based malware detection using two dimensional binary program features. In: 2015 10th International Conference on Malicious and Unwanted Software (MALWARE); 2015. p. 11-20.
- Adaptive Feature Fusion via Graph Neural Network for Person Re-Identification. In: Proceedings of the 27th ACM International Conference on Multimedia (MM); 2019. p. 2115-23.
- Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI. Information Fusion. 2021;71:28-37.
- Anderson HS, Roth P. Ember: an open dataset for training static pe malware machine learning models. arXiv preprint arXiv:180404637. 2018.
- Weisfeiler-Lehman graph kernels. Journal of Machine Learning Research. 2011;12(9).
- Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:190710903. 2019.
- Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS); 2017. p. 363-76.
- Pal SK, Mitra S. Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks. 1992;3(5):683-97.
- Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:12070580. 2012.
- Heterogeneous Graph Matching Networks for Unknown Malware Detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). International Joint Conferences on Artificial Intelligence Organization; 2019. p. 3762-70. Available from: https://doi.org/10.24963/ijcai.2019/522.
- A TAN based hybrid model for android malware detection. Journal of Information Security and Applications. 2020;54:102483. Available from: https://www.sciencedirect.com/science/article/pii/S2214212618308263.
- Yeboah-Ofori A, Boachie C. Malware Attack Predictive Analytics in a Cyber Supply Chain Context Using Machine Learning. In: 2019 International Conference on Cyber Security and Internet of Things (ICSIoT). IEEE; 2019. p. 66-73.
- Enhanced Android Malware Detection: An SVM-Based Machine Learning Approach. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE; 2020. p. 75-81.
- PMMSA: Security analysis system for android wearable applications based on permission matching and malware similarity analysis. Future Generation Computer Systems. 2022;137:349-62. Available from: https://www.sciencedirect.com/science/article/pii/S0167739X22002631.
- A Lightweight On-Device Detection Method for Android Malware. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2021;51(9):5600-11.
- Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security. ACM Trans Internet Technol. 2021 sep;22(1). Available from: https://doi.org/10.1145/3436751.
- Feed-forward deep neural network (FFDNN)-based deep features for static malware detection. International Journal of Intelligent Systems. 2023;2023.
- Investigation and pre-processing of CLaMP mlaware dataset for machine learning models. In: 2022 6th International Conference on Electronics, Communication and Aerospace Technology. IEEE; 2022. p. 891-5.
- Minimized feature overhead malware detection machine learning model employing MRMR-based ranking. Concurrency and Computation: Practice and Experience. 2022;34(17):e6992.
- Performance Enhancement of SVM-based ML Malware Detection Model Using Data Preprocessing. In: 2022 2nd International Conference on Emerging Frontiers in Electrical and Electronic Technologies (ICEFEET). IEEE; 2022. p. 1-4.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.