Anomaly Detection of Command Shell Sessions based on DistilBERT: Unsupervised and Supervised Approaches (2310.13247v1)
Abstract: Anomaly detection in command shell sessions is a critical aspect of computer security. Recent advances in deep learning and natural language processing, particularly transformer-based models, have shown great promise for addressing complex security challenges. In this paper, we implement a comprehensive approach to detect anomalies in Unix shell sessions using a pretrained DistilBERT model, leveraging both unsupervised and supervised learning techniques to identify anomalous activity while minimizing data labeling. The unsupervised method captures the underlying structure and syntax of Unix shell commands, enabling the detection of session deviations from normal behavior. Experiments on a large-scale enterprise dataset collected from production systems demonstrate the effectiveness of our approach in detecting anomalous behavior in Unix shell sessions. This work highlights the potential of leveraging recent advances in transformers to address important computer security challenges.
- Charu C. Aggarwal. Outlier Analysis. Springer Publishing Company, Incorporated, 2nd edition, 2016.
- A conceptual hybrid model of deep convolutional neural network (dcnn) and long short-term memory (lstm) for masquerade attack detection. In Information and Communication Technology and Applications: Third International Conference, ICTA 2020, Minna, Nigeria, November 24–27, 2020, Revised Selected Papers 3, pages 170–184. Springer, 2021.
- A survey on masquerader detection approaches. In Proceedings of V Congreso Iberoamericano de Seguridad Informática, Universidad de la República de Uruguay, pages 46–60, 2008.
- Red Canary®. Atomic red team™. https://github.com/redcanaryco/atomic-red-team, May 2023. Accessed: 2023-03-01.
- Stefan-Bogdan Cocea. Bert embeddings: A modern machine-learning approach for detecting malware from command lines (part 1 of 2). https://www.crowdstrike.com/blog/bert-embeddings-new-approach-for-command-line-anomaly-detection/, January 2022. Accessed: 2022-06-01.
- The MITRE Corporation. Mitre att&ck® enterprise techniques,. https://attack.mitre.org/techniques/enterprise, 2023. Accessed: 2023-03-01.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 1285–1298, 2017.
- Deep learning approaches for predictive masquerade detection. Security and Communication Networks, 2018, 2018.
- Saul Greenberg. Using unix: Collected traces of 168 users. Technical report, Research Report 88/333/45, Department of Computer Science, University of Calgary, Calgary, Alberta, 1988.
- Logbert: Log anomaly detection via bert. In 2021 international joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2021.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Empirical evaluation of svm-based masquerade detection using unix commands. Computers & Security, 24(2):160–168, 2005.
- An application of machine learning to anomaly detection. In Proceedings of the 20th national information systems security conference, volume 377, pages 366–380. Baltimore, USA, 1997.
- Log-based anomaly detection without log parsing. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 492–504. IEEE, 2021.
- Log-based anomaly detection with deep learning: How far are we? In Proceedings of the 44th international conference on software engineering, pages 1356–1367, 2022.
- Copod: copula-based outlier detection. In 2020 IEEE international conference on data mining (ICDM), pages 1118–1123. IEEE, 2020.
- Nl2bash: A corpus and semantic parser for natural language interface to the linux operating system. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
- Isolation forest. In 2008 eighth ieee international conference on data mining, pages 413–422. IEEE, 2008.
- Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1):1–39, 2012.
- Hmms based masquerade detection for network security on with parallel computing. Computer Communications, 156:168–173, 2020.
- Roy A Maxion. Masquerade detection using enriched command lines. In 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings., pages 5–5. IEEE Computer Society, 2003.
- Masquerade detection using truncated command lines. In Proceedings international conference on dependable systems and networks, pages 219–228. IEEE, 2002.
- Cristian Popa. Bert embeddings: A modern machine-learning approach for detecting malware from command lines (part 2 of 2). https://www.crowdstrike.com/blog/bert-embeddings-new-approach-for-command-line-anomaly-detection-part-2/, April 2022. Accessed: 2022-06-01.
- Improving language understanding by generative pre-training. 2018.
- Pritam Salunkhe. Linux commands & utilities commonly used by attackers. https://www.uptycs.com/blog/linux-commands-and-utilities-commonly-used-by-attackers, May 2021. Accessed: 2022-10-01.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
- Computer intrusion: Detecting masquerades. Statistical science, pages 58–74, 2001.
- A novel anomaly detection scheme based on principal component classifier. In Proceedings of the IEEE foundations and new directions of data mining workshop, pages 172–179. IEEE Press, 2003.
- Efficient few-shot learning without prompts. arXiv preprint arXiv:2209.11055, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Ke Wang and Salvatore J Stolfo. One-class training for masquerade detection. In Workshop on Data Mining for Computer Security, 2003.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020.
- A survey on log anomaly detection using deep learning. In 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), pages 1215–1220. IEEE, 2020.
- Deep learning for insider threat detection: Review, challenges and opportunities. Computers & Security, 104:102221, 2021.
- Masquerade detection based on temporal convolutional network. In 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pages 305–310. IEEE, 2022.
- Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 807–817, 2019.
- Pyod: A python toolbox for scalable outlier detection. arXiv preprint arXiv:1901.01588, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.