Mitigating Label Flipping Attacks in Malicious URL Detectors Using Ensemble Trees (2403.02995v1)
Abstract: Malicious URLs provide adversarial opportunities across various industries, including transportation, healthcare, energy, and banking which could be detrimental to business operations. Consequently, the detection of these URLs is of crucial importance; however, current Machine Learning (ML) models are susceptible to backdoor attacks. These attacks involve manipulating a small percentage of training data labels, such as Label Flipping (LF), which changes benign labels to malicious ones and vice versa. This manipulation results in misclassification and leads to incorrect model behavior. Therefore, integrating defense mechanisms into the architecture of ML models becomes an imperative consideration to fortify against potential attacks. The focus of this study is on backdoor attacks in the context of URL detection using ensemble trees. By illuminating the motivations behind such attacks, highlighting the roles of attackers, and emphasizing the critical importance of effective defense strategies, this paper contributes to the ongoing efforts to fortify ML models against adversarial threats within the ML domain in network security. We propose an innovative alarm system that detects the presence of poisoned labels and a defense mechanism designed to uncover the original class labels with the aim of mitigating backdoor attacks on ensemble tree classifiers. We conducted a case study using the Alexa and Phishing Site URL datasets and showed that LF attacks can be addressed using our proposed defense mechanism. Our experimental results prove that the LF attack achieved an Attack Success Rate (ASR) between 50-65% within 2-5%, and the innovative defense method successfully detected poisoned labels with an accuracy of up to 100%.
- E. Nowroozi, Abhishek, M. Mohammadi, and M. Conti, “An adversarial attack analysis on malicious advertisement url detection framework,” IEEE Transactions on Network and Service Management, pp. 1–1, 2022.
- “Symantec endpoint protection,” https://techdocs.broadcom.com/us/en/symantec-security-software/endpoint-security-and-management/endpoint-protection/all/symantec-endpoint-protection-client-for-windows-help/enabling-network-traffic-redirection.html, note = Accessed: 2023-10-04.
- “Mcafee,” https://www.mcafee.com/en-us/safe-browser/mcafee-webadvisor.html, note = Accessed: 2023-10-04.
- “Cisco umbrella,” https://umbrella.cisco.com/trends-threats/malware-detection-and-protection, note = Accessed: 2023-10-04.
- M. A. Sankaran, S. Mathiyazhagan, M. Dharmaraj et al., “Detection of malicious urls using machine learning techniques,” Int. J. of Aquatic Science, vol. 12, no. 3, pp. 1980–1989, 2021.
- I. M. Ahmed and M. Y. Kashmoola, “Threats on machine learning technique by data poisoning attack: A survey,” in Advances in Cyber Security: Third International Conference, ACeS 2021, Penang, Malaysia, August 24–25, 2021, Revised Selected Papers 3. Springer, 2021, pp. 586–600.
- E. Nowroozi, M. Mohammadi, E. Savaş, Y. Mekdad, and M. Conti, “Employing deep ensemble learning for improving the security of computer networks against adversarial attacks,” IEEE Transactions on Network and Service Management, pp. 1–1, 2023.
- J. Lin, L. Dang, M. Rahouti, and K. Xiong, “Ml attack models: Adversarial attacks and data poisoning attacks,” arXiv preprint arXiv:2112.02797, 2021.
- I. Moisejevs, “Evasion attacks on machine learning (or “adversarial examples”),” Towards Data Science, accessed July, vol. 21, 2019.
- A. R. Shahid, A. Imteaj, P. Y. Wu, D. A. Igoche, and T. Alam, “Label flipping data poisoning attack against wearable human activity recognition system,” in 2022 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2022, pp. 908–914.
- J. Y. Chang and E. G. Im, “Data poisoning attack on random forest classification model,” Proc. of SMA 2020, 2020.
- C. Dunn, N. Moustafa, and B. Turnbull, “Robustness evaluations of sustainable machine learning models against data poisoning attacks in the internet of things,” Sustainability, vol. 12, no. 16, p. 6434, 2020.
- N. Carlini and D. Wagner, “Adversarial examples are not easily detected: Bypassing ten detection methods,” in Proceedings of the 10th ACM workshop on artificial intelligence and security, 2017, pp. 3–14.
- J. Gilmer, R. P. Adams, I. Goodfellow, D. Andersen, and G. E. Dahl, “Motivating the rules of the game for adversarial example research. arxiv 2018,” arXiv preprint arXiv:1807.06732, 1807.
- R. S. S. Kumar, M. Nyström, J. Lambert, A. Marshall, M. Goertzel, A. Comissoneru, M. Swann, and S. Xia, “Adversarial machine learning-industry perspectives,” in 2020 IEEE Security and Privacy Workshops (SPW). IEEE, 2020, pp. 69–75.
- “Goodgle advertisement: Responsible ai practices,” https://ai.google/responsibility/responsible-ai-practices/, accessed: 2023-08-22.
- “Microsoft advertisement: Securing the future of artificial intelligence and machine learning at microsoft,” https://learn.microsoft.com/en-us/security/engineering/securing-artificial-intelligence-machine-learning, accessed: 2023-08-22.
- X. Liu, H. Li, G. Xu, Z. Chen, X. Huang, and R. Lu, “Privacy-enhanced federated learning against poisoning adversaries,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 4574–4588, 2021.
- E. Nowroozi, “Ensemble trees: A defense mechanism against label flipping in malicious url detection,” https://github.com/ehsannowroozi/LF_Attack_Defense_URL, 2023.
- M. Goldblum, D. Tsipras, C. Xie, X. Chen, A. Schwarzschild, D. Song, A. Mądry, B. Li, and T. Goldstein, “Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1563–1580, 2022.
- M. A. Ramirez, S. Yoon, E. Damiani, H. A. Hamadi, C. A. Ardagna, N. Bena, Y.-J. Byon, T.-Y. Kim, C.-S. Cho, and C. Y. Yeun, “New data poison attacks on machine learning classifiers for mobile exfiltration,” arXiv preprint arXiv:2210.11592, 2022.
- F. A. Yerlikaya and Ş. Bahtiyar, “Data poisoning attacks against machine learning algorithms,” Expert Systems with Applications, vol. 208, p. 118101, 2022.
- J. Gardiner and S. Nagaraja, “On the security of machine learning in malware c&c detection: A survey,” ACM Computing Surveys (CSUR), vol. 49, no. 3, pp. 1–39, 2016.
- A. Paudice, L. Muñoz-González, and E. C. Lupu, “Label sanitization against label flipping poisoning attacks,” in ECML PKDD 2018 Workshops: Nemesis 2018, UrbReas 2018, SoGood 2018, IWAISe 2018, and Green Data Mining 2018, Dublin, Ireland, September 10-14, 2018, Proceedings 18. Springer, 2019, pp. 5–15.
- L. Deng, “The mnist database of handwritten digit images for machine learning research [best of the web],” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
- R. E. F. G. Hopkins, Mark and J. Suermondt, “Spambase,” UCI Machine Learning Repository, 1999, DOI: https://doi.org/10.24432/C53G6X.
- M. O. S. N. Wolberg, William and W. Street, “Breast Cancer Wisconsin (Diagnostic),” UCI Machine Learning Repository, 1995, DOI: https://doi.org/10.24432/C5DW2B.
- A. Paudice, L. Muñoz-González, A. Gyorgy, and E. C. Lupu, “Detection of adversarial training examples in poisoning attacks through anomaly detection,” arXiv preprint arXiv:1802.03041, 2018.
- P. P. Chan, Z. He, X. Hu, E. C. Tsang, D. S. Yeung, and W. W. Ng, “Causative label flip attack detection with data complexity measures,” International Journal of Machine Learning and Cybernetics, vol. 12, pp. 103–116, 2021.
- “Uci machine learning repository,” http://archive.ics.uci.edu/, note = Accessed: 2023-09-26.
- J. Alcal-Fdez, A. Fernndez, J. Luengo, J. Derrac, S. Garca, L. Snchez, and F. Herrera, “Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 255–287, 2011.
- A. D. G. A. O. L. Reyes-Ortiz, Jorge and X. Parra, “Human Activity Recognition Using Smartphones,” UCI Machine Learning Repository, 2012, DOI: https://doi.org/10.24432/C54S4K.
- M. Anisetti, C. A. Ardagna, A. Balestrucci, N. Bena, E. Damiani, and C. Y. Yeun, “On the robustness of ensemble-based machine learning against data poisoning,” arXiv preprint arXiv:2209.14013, 2022.
- D. Chapman and A. Jain, “Musk (Version 2),” UCI Machine Learning Repository, 1994, DOI: https://doi.org/10.24432/C51608.
- N. Cheng, H. Zhang, and Z. Li, “Data sanitization against label flipping attacks using adaboost-based semi-supervised learning technology,” Soft Computing, vol. 25, no. 23, pp. 14 573–14 581, 2021.
- W. Wolberg, “Breast Cancer Wisconsin (Original),” UCI Machine Learning Repository, 1992, DOI: https://doi.org/10.24432/C5HP4Z.
- A. Shapiro, “Chess (King-Rook vs. King-Pawn),” UCI Machine Learning Repository, 1989, DOI: https://doi.org/10.24432/C5DK5C.
- P. Tavallali, V. Behzadan, A. Alizadeh, A. Ranganath, and M. Singhal, “Adversarial label-poisoning attacks and defense for general multi-class models based on synthetic reduced nearest neighbor,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 3717–3722.
- R. Taheri, R. Javidan, M. Shojafar, Z. Pooranian, A. Miri, and M. Conti, “On defending against label flipping attacks on malware detection systems,” Neural Computing and Applications, vol. 32, pp. 14 781–14 800, 2020.
- D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, “Drebin: Effective and explainable detection of android malware in your pocket.” in Ndss, vol. 14, 2014, pp. 23–26.
- “Contagio dataset,” http://contagiominidump.blogspot.com/, note = Accessed: 2023-09-26.
- Y. Zhou and X. Jiang, “Dissecting android malware: Characterization and evolution,” in 2012 IEEE symposium on security and privacy. IEEE, 2012, pp. 95–109.
- C. Wang, J. Chen, Y. Yang, X. Ma, and J. Liu, “Poisoning attacks and countermeasures in intelligent networks: Status quo and prospects,” Digital Communications and Networks, vol. 8, no. 2, pp. 225–234, 2022.
- Ehsan Nowroozi (19 papers)
- Nada Jadalla (1 paper)
- Samaneh Ghelichkhani (2 papers)
- Alireza Jolfaei (12 papers)