Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models (2404.14138v1)
Abstract: Web Vulnerability Assessment and Penetration Testing (Web VAPT) is a comprehensive cybersecurity process that uncovers a range of vulnerabilities which, if exploited, could compromise the integrity of web applications. In a VAPT, it is common to perform a \textit{Directory brute-forcing Attack}, aiming at the identification of accessible directories of a target website. Current commercial solutions are inefficient as they are based on brute-forcing strategies that use wordlists, resulting in enormous quantities of trials for a small amount of success. Offensive AI is a recent paradigm that integrates AI-based technologies in cyber attacks. In this work, we explore whether AI can enhance the directory enumeration process and propose a novel LLM-based framework. Our experiments -- conducted in a testbed consisting of 1 million URLs from different web application domains (universities, hospitals, government, companies) -- demonstrate the superiority of the LM-based attack, with an average performance increase of 969%.
- Abdulrahman Al-Hababi and Sezer C Tokgoz. 2020. Man-in-the-middle attacks to detect and identify services in encrypted network flows using machine learning. In 2020 3rd International Conference on Advanced Communication Technologies and Networking (CommNet). IEEE, 1–5.
- Leveraging AI to optimize website structure discovery during Penetration Testing. arXiv:2101.07223 [cs.CR]
- Dos and don’ts of machine learning in computer security. In 31st USENIX Security Symposium (USENIX Security 22). 3971–3988.
- Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”.
- Deepmasterprints: Generating masterprints for dictionary attacks via latent variable evolution. In 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE, 1–9.
- A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology (2023).
- AI Based Directory Discovery Attack and Prevention of the Medical Systems. In 2022 Computing in Cardiology (CinC), Vol. 498. IEEE, 1–4.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
- Nektaria Kaloudi and Jingyue Li. 2020. The ai-based cyber threat landscape: A survey. ACM Computing Surveys (CSUR) 53, 1 (2020), 1–34.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980
- A feature-vector generative adversarial network for evading PDF malware classifiers. Information Sciences 523 (2020), 38–48.
- The threat of offensive ai to organizations. Computers & Security 124 (2023), 103006.
- Recurrent gans password cracker for iot password security enhancement. Sensors 20, 11 (2020), 3106.
- PyTorch: an imperative style, high-performance deep learning library. Curran Associates Inc.
- Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
- Language Models as Knowledge Bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China, 2463–2473. https://doi.org/10.18653/v1/D19-1250
- Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing. 298–307.
- Attention is all you need. Advances in neural information processing systems 30 (2017).