Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors (2310.03166v2)

Published 4 Oct 2023 in cs.CR and cs.LG

Abstract: Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. 2011. HTML 5. (April 2011). https://www.w3.org/TR/2011/WD-html5-20110405/.
  2. VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (Virtual Event, USA) (CCS ’20). Association for Computing Machinery, New York, NY, USA, 1681–1698. https://doi.org/10.1145/3372297.3417233
  3. Generating Optimal Attack Paths in Generative Adversarial Phishing. In 2021 IEEE International Conference on Intelligence and Security Informatics (ISI). 1–6. https://doi.org/10.1109/ISI53945.2021.9624751
  4. Ahmed AlEroud and George Karabatis. 2020. Bypassing Detection of URL-Based Phishing Attacks Using Generative Adversarial Deep Neural Networks. In Proceedings of the Sixth International Workshop on Security and Privacy Analytics (New Orleans, LA, USA) (IWSPA ’20). Association for Computing Machinery, New York, NY, USA, 53–60. https://doi.org/10.1145/3375708.3380315
  5. SpacePhish: The Evasion-Space of Adversarial Attacks against Phishing Website Detectors Using Machine Learning. In Proceedings of the 38th Annual Computer Security Applications Conference (Austin, TX, USA) (ACSAC ’22). Association for Computing Machinery, New York, NY, USA, 171–185. https://doi.org/10.1145/3564625.3567980
  6. SpacePhish: The Evasion-space of Adversarial Attacks against Phishing Website Detectors using Machine Learning [Artifact]. In Proceedings of the 38th Annual Computer Security Applications Conference. ACM. https://spacephish.github.io/docs/ACSAC22_SpacePhish-supp.pdf
  7. Dos and Don’ts of Machine Learning in Computer Security. In Proc. of USENIX Security Symposium.
  8. PWDGAN: Generating Adversarial Malicious URL Examples for Deceiving Black-Box Phishing Website Detector using GANs. In 2021 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT). 1–4. https://doi.org/10.1109/ICMLANT53170.2021.9690540
  9. DeepPhish: Simulating Malicious AI. In 2018 APWG symposium on electronic crime research (eCrime). 1–8.
  10. Query efficient black-box adversarial attack on deep neural networks. Pattern Recognition 133 (2023), 109037.
  11. JShadObf: A JavaScript Obfuscator Based on Multi-Objective Optimization Algorithms. In Network and System Security, Javier Lopez, Xinyi Huang, and Ravi Sandhu (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 336–349.
  12. Battista Biggio and Fabio Roli. 2018. Wild patterns: Ten years after the rise of adversarial machine learning. Elsevier Pattern Recogn. 84 (2018), 317–331.
  13. Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5–32. https://doi.org/10.1023/A:1010933404324
  14. Improving black-box adversarial attacks with a transfer-based prior. Advances in neural information processing systems 32 (2019).
  15. A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs. arXiv preprint arXiv:2205.13155 (2022).
  16. DeltaPhish: Detecting Phishing Webpages in Compromised Websites. In Computer Security – ESORICS 2017, Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Springer International Publishing, Cham, 370–388.
  17. Functionality-Preserving Black-Box Optimization of Adversarial Windows Malware. IEEE Transactions on Information Forensics and Security 16 (2021), 3469–3478. https://doi.org/10.1109/TIFS.2021.3082330
  18. WAF-A-MoLE: Evading Web Application Firewalls through Adversarial Machine Learning. In Proceedings of the 35th Annual ACM Symposium on Applied Computing (Brno, Czech Republic) (SAC ’20). Association for Computing Machinery, New York, NY, USA, 1745–1752. https://doi.org/10.1145/3341105.3373962
  19. Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection. IEEE Transactions on Dependable and Secure Computing 16 (2017), 711–724.
  20. Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 321–338.
  21. Evading Anti-Phishing Models: A Field Note Documenting an Experience in the Machine Learning Security Evasion Competition 2022. Digital Threats (jun 2023). https://doi.org/10.1145/3603507
  22. Deep Learning. MIT Press. http://www.deeplearningbook.org
  23. Feature Importance Guided Attack: A Model Agnostic Adversarial Attack. CoRR abs/2106.14815 (2021).
  24. Abdelhakim Hannousse and Salima Yahiouche. 2020. Towards Benchmark Datasets for Machine Learning Based Website Phishing Detection: An experimental study. CoRR abs/2010.12847 (2020).
  25. Document Object Model (DOM) Level 3 Core Specification. Technical Report. W3C. https://www.w3.org/TR/DOM-Level-3-Core.
  26. Ankit Kumar Jain and Brij Bhooshan Gupta. 2018. Towards detection of phishing websites on client-side using machine learning based approach. Telecommunication Systems 68 (2018), 687–700.
  27. Simon Josefsson. 2003. The Base16, Base32, and Base64 Data Encodings. RFC 3548 (2003), 1–13. https://api.semanticscholar.org/CorpusID:5739143
  28. Kaspersky. 2023. Spam and phishing in 2022. https://securelist.com/spam-phishing-scam-report-2022/108692/
  29. Towards a Contingency Approach with Whitelist-and Blacklist-Based Anti-Phishing Applications: What Do Usability Tests Indicate? Behav. Inf. Technol. 33, 11 (nov 2014), 1136–1147. https://doi.org/10.1080/0144929X.2013.875221
  30. Cracking Classifiers for Evasion: A Case Study on the Google’s Phishing Pages Filter. In Proceedings of the 25th International Conference on World Wide Web (Montréal, Québec, Canada) (WWW ’16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 345–356. https://doi.org/10.1145/2872427.2883060
  31. Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=rJzIBfZAb
  32. Intelligent rule-based phishing websites classification. IET Inf. Secur. 8 (2014), 153–160.
  33. Certified Training: Small Boxes are All You Need. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=7oFuxtJtUMH
  34. PhishMon: A Machine Learning Framework for Detecting Phishing Webpages. In 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). 220–225. https://doi.org/10.1109/ISI.2018.8587410
  35. PhishTime: Continuous Longitudinal Measurement of the Effectiveness of Anti-phishing Blacklists. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 379–396.
  36. Alina Oprea and Apostol Vassilev. 2023. Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (Draft). Technical Report. National Institute of Standards and Technology. https://doi.org/10.6028/NIST.AI.100-2e2023.ipd
  37. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).
  38. GradFuzz: Fuzzing Deep Neural Networks with Gradient Vector Coverage for Adversarial Examples. Neurocomput. 522, C (feb 2023), 165–180. https://doi.org/10.1016/j.neucom.2022.12.019
  39. Opening the Blackbox of VirusTotal: Analyzing Online Phishing Scan Engines. In Proceedings of the Internet Measurement Conference (Amsterdam, Netherlands) (IMC ’19). Association for Computing Machinery, New York, NY, USA, 478–485. https://doi.org/10.1145/3355369.3355585
  40. PhishNet: Predictive Blacklisting to Detect Phishing Attacks. In 2010 Proceedings IEEE INFOCOM. 1–5. https://doi.org/10.1109/INFCOM.2010.5462216
  41. ProofPoint. 2023. State of the Phish 2023. https://www.proofpoint.com/us/resources/threat-reports/state-of-phish
  42. A Feature Selection Comparative Study for Web Phishing Datasets. In 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). 1–6. https://doi.org/10.1109/CONECCT50063.2020.9198349
  43. Simple and Efficient Hard Label Black-Box Adversarial Attacks in Low Query Budget Regimes (KDD ’21). Association for Computing Machinery, New York, NY, USA, 1461–1469. https://doi.org/10.1145/3447548.3467386
  44. Advanced evasion attacks and mitigations on practical ML-based phishing website classifiers. International Journal of Intelligent Systems 36, 9 (2021), 5210–5240. https://doi.org/10.1002/int.22510
  45. Todd Stansfield. 2023. Q4 2022 Malware and Phishing Report. Technical Report. Vade. https://www.vadesecure.com/en/blog/q4-2022-phishing-and-malware-report
  46. Lizhen Tang and Qusay H. Mahmoud. 2021. A Survey of Machine Learning-Based Solutions for Phishing Website Detection. Machine Learning and Knowledge Extraction 3, 3 (2021), 672–694. https://doi.org/10.3390/make3030034
  47. Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild. In Proceedings of the Internet Measurement Conference 2018 (Boston, MA, USA) (IMC ’18). Association for Computing Machinery, New York, NY, USA, 429–442. https://doi.org/10.1145/3278532.3278569
  48. Accurate and Fast URL Phishing Detector: A Convolutional Neural Network Approach. Comput. Netw. 178, C (sep 2020), 9 pages. https://doi.org/10.1016/j.comnet.2020.107275
  49. The power of obfuscation techniques in malicious JavaScript code: A measurement study. In 2012 7th International Conference on Malicious and Unwanted Software. 9–16. https://doi.org/10.1109/MALWARE.2012.6461002
  50. Mutation-Based Fuzzing. In The Fuzzing Book. CISPA Helmholtz Center for Information Security. https://www.fuzzingbook.org/html/MutationFuzzer.html Retrieved 2023-01-07 14:53:00+01:00.
  51. Efficient Adversarial Training With Transferable Adversarial Examples. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 1178–1187. https://doi.org/10.1109/CVPR42600.2020.00126
  52. Yaoyao Zhong and Weihong Deng. 2021. Towards Transferable Adversarial Attack Against Deep Face Recognition. IEEE Transactions on Information Forensics and Security 16 (2021), 1452–1466. https://doi.org/10.1109/TIFS.2020.3036801
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Biagio Montaruli (3 papers)
  2. Luca Demetrio (28 papers)
  3. Maura Pintor (24 papers)
  4. Luca Compagna (4 papers)
  5. Davide Balzarotti (6 papers)
  6. Battista Biggio (81 papers)
Citations (4)