Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 81 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 432 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Robust Synthetic Data-Driven Detection of Living-Off-the-Land Reverse Shells (2402.18329v2)

Published 28 Feb 2024 in cs.CR and cs.LG

Abstract: Living-off-the-land (LOTL) techniques pose a significant challenge to security operations, exploiting legitimate tools to execute malicious commands that evade traditional detection methods. To address this, we present a robust augmentation framework for cyber defense systems as Security Information and Event Management (SIEM) solutions, enabling the detection of LOTL attacks such as reverse shells through machine learning. Leveraging real-world threat intelligence and adversarial training, our framework synthesizes diverse malicious datasets while preserving the variability of legitimate activity, ensuring high accuracy and low false-positive rates. We validate our approach through extensive experiments on enterprise-scale datasets, achieving a 90\% improvement in detection rates over non-augmented baselines at an industry-grade False Positive Rate (FPR) of $10{-5}$. We define black-box data-driven attacks that successfully evade unprotected models, and develop defenses to mitigate them, producing adversarially robust variants of ML models. Ethical considerations are central to this work; we discuss safeguards for synthetic data generation and the responsible release of pre-trained models across four best performing architectures, including both adversarially and regularly trained variants: https://huggingface.co/dtrizna/quasarnix. Furthermore, we provide a malicious LOTL dataset containing over 1 million augmented attack variants to enable reproducible research and community collaboration: https://huggingface.co/datasets/dtrizna/QuasarNix. This work offers a reproducible, scalable, and production-ready defense against evolving LOTL threats.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Appleby, A. Murmurhash3. https://github.com/aappleby/smhasher, 2011.
  2. The role of machine learning in cybersecurity. Digital Threats: Research and Practice 4, 1 (Mar. 2023), 1–38.
  3. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015 (2015).
  4. Combat security alert fatigue with ai-assisted techniques. In CSET ’21: Proceedings of the 14th Cyber Security Experimentation and Test Workshop (08 2021), pp. 9–16.
  5. Survivalism: Systematic analysis of windows malware living-off-the-land. In 2021 IEEE Symposium on Security and Privacy (SP) (2021), pp. 1557–1574.
  6. Support vector machines under adversarial label noise. In Proceedings of the Asian Conference on Machine Learning (South Garden Hotels and Resorts, Taoyuan, Taiwain, 14–15 Nov 2011), C.-N. Hsu and W. S. Lee, Eds., vol. 20 of Proceedings of Machine Learning Research, PMLR, pp. 97–112.
  7. Natural Language Toolkit. NLTK Project, 2009.
  8. Towards nlp-based processing of honeypot logs. In 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) (2022), pp. 314–321.
  9. Broder, A. Z. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171) (1997), IEEE, pp. 21–29.
  10. Min-wise independent permutations. Journal of Computer and System Sciences (1998).
  11. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2016), KDD ’16, Association for Computing Machinery, p. 785–794.
  12. Wild patterns reloaded: A survey of machine learning security against training data poisoning. ACM Comput. Surv. 55, 13s (jul 2023).
  13. Cluster25 Threat Intel Team. CVE-2023-38831 exploited by pro-russia hacking groups in RU-UA conflict zone for credential harvesting operations. https://blog.cluster25.duskrise.com/2023/10/12/cve-2023-38831-russian-attack, 10 2023. Accessed: 2023-10-20.
  14. Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues. Information Sciences 239 (2013), 201–225.
  15. CrowdStrike. 2020 global threat report. Tech. rep., CrowdStrike Inc., 2020. Accessed: 5 Dec 2023.
  16. Cybersecurity & Infrastructure Security Agency. Identifying and mitigating living off the land techniques. https://www.cisa.gov/resources-tools/resources/identifying-and-mitigating-living-land-techniques, February 2024. Published on the official website of the U.S. Department of Homeland Security, Cybersecurity & Infrastructure Security Agency.
  17. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  18. Elastic N.V. Auditbeat. https://www.elastic.co/beats/auditbeat, 2023. Elasticsearch BV.
  19. Fortra. Cobalt Strike, 2023. https://www.cobaltstrike.com/, Accessed: 2023-08-08.
  20. Regex-based tokenization for afghan languages. International Journal of Computer Applications (2012).
  21. Universal detection of backdoor attacks via density-based clustering and centroids analysis. IEEE Transactions on Information Forensics and Security 19 (2024), 970–984.
  22. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
  23. Neurlux: Dynamic malware analysis without feature engineering, 2019.
  24. Scaling laws for neural language models, 2020.
  25. Metasploit: The Penetration Tester’s Guide. No Starch Press, 2011.
  26. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226 (2018).
  27. Adaptive intrusion detection: A data mining approach. Artificial Intelligence Review 14 (2000), 533–567.
  28. Nl2bash: A corpus and semantic parser for natural language interface to the linux operating system. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation LREC 2018, Miyazaki (Japan), 7-12 May, 2018. (2018).
  29. Decoupled weight decay regularization, 2019.
  30. From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence 2, 1 (2020), 2522–5839.
  31. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 4765–4774.
  32. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (2018).
  33. Mandiant. M-trends 2021 annual threat report. Tech. rep., Mandiant, 2021. Accessed: 5 Dec 2023.
  34. The problem of tokenization and the penn chinese treebank. Machine Learning (1994).
  35. Hacking Kubernetes. O’Reilly Media, Inc., Oct 2021.
  36. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
  37. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (2010), pp. 807–814.
  38. Living-off-the-land command detection using active learning. In Proceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses (New York, NY, USA, 2021), RAID ’21, Association for Computing Machinery, p. 442–455.
  39. Pytorch: An imperative style, high-performance deep learning library, 2019.
  40. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
  41. GTFOBins, 2023. https://gtfobins.github.io, Accessed: 2023-08-08.
  42. Polop, C. HackTricks. GitBook, 2023. Accessed: December 8, 2023.
  43. Improving language understanding by generative pre-training, 2018.
  44. Language models are unsupervised multitask learners, 2019.
  45. Rapid7. Meterpreter - metasploit unleashed, 2023. https://www.metasploit.com/, Accessed: 2023-08-08.
  46. Research, G. Market share: Servers, all countries, 2q23 update. Published on 26 September 2023.
  47. Roth, F. Sigma rule: Suspicious reverse shell command line. https://github.com/SigmaHQ/sigma/blob/master/rules/linux/builtin/lnx_shell_susp_rev_shells.yml, 2019.
  48. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition, 2014.
  49. A statistical approach to mechanized encoding and searching of literary information. Journal (1983).
  50. Neural machine translation of rare words with subword units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) (2016).
  51. Explanation-Guided backdoor poisoning attacks against malware classifiers. In 30th USENIX Security Symposium (USENIX Security 21) (Aug. 2021), USENIX Association, pp. 1487–1504.
  52. Super-convergence: Very fast training of neural networks using large learning rates, 2018.
  53. Trizna, D. Shell language processing: Unix command parsing for machine learning, 2022.
  54. Attention is all you need. In Advances in Neural Information Processing Systems (USA, 2017), I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30, Curran Associates, Inc.
  55. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In 23rd USENIX Security Symposium (San Diego, CA, 2014), pp. 239–254.
  56. PROGRAPHER: An anomaly detection system based on provenance graph embedding. In 32nd USENIX Security Symposium (USENIX Security 23) (Anaheim, CA, Aug. 2023), USENIX Association, pp. 4355–4372.
  57. Masquerade detection based on temporal convolutional network. In 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD) (2022), pp. 305–310.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 6 likes.

Upgrade to Pro to view all of the tweets about this paper: