Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TapTree: Process-Tree Based Host Behavior Modeling and Threat Detection Framework via Sequential Pattern Mining (2312.07575v1)

Published 10 Dec 2023 in cs.CR

Abstract: Audit logs containing system level events are frequently used for behavior modeling as they can provide detailed insight into cyber-threat occurrences. However, mapping low-level system events in audit logs to highlevel behaviors has been a major challenge in identifying host contextual behavior for the purpose of detecting potential cyber threats. Relying on domain expert knowledge may limit its practical implementation. This paper presents TapTree, an automated process-tree based technique to extract host behavior by compiling system events' semantic information. After extracting behaviors as system generated process trees, TapTree integrates event semantics as a representation of behaviors. To further reduce pattern matching workloads for the analyst, TapTree aggregates semantically equivalent patterns and optimizes representative behaviors. In our evaluation against a recent benchmark audit log dataset (DARPA OpTC), TapTree employs tree pattern queries and sequential pattern mining techniques to deduce the semantics of connected system events, achieving high accuracy for behavior abstraction and then Advanced Persistent Threat (APT) attack detection. Moreover, we illustrate how to update the baseline model gradually online, allowing it to adapt to new log patterns over time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 1777–1794, 2019.
  2. Deeptaskapt: Insider apt detection using task-tree based deep learning. arXiv preprint arXiv:2108.13989, 2021.
  3. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1285–1298, 2017.
  4. A review of threat modelling approaches for apt-style attacks. Heliyon, 7(1):e05969, 2021.
  5. Loggc: garbage collecting audit log. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 1005–1016, 2013.
  6. Towards a timely causality analysis for enterprise security. In NDSS, 2018.
  7. Md Nahid et al. Hossain. Sleuth: Real-time attack scenario reconstruction from cots audit data. In The 26th USENIX Security Symposium, pages 487–504, 2017.
  8. Behavior query discovery in system-generated temporal graphs. arXiv preprint arXiv:1511.05911, 2015.
  9. Unicorn: Runtime provenance-based detector for advanced persistent threats. arXiv preprint arXiv:2001.01525, 2020.
  10. Watson: Abstracting behaviors from audit logs via aggregation of contextual semantics. In Proceedings of the 28th Annual Network and Distributed System Security Symposium, NDSS, 2021.
  11. Tell them from me: An encrypted application profiler. In International Conference on Network and System Security, pages 456–471. Springer, 2019.
  12. Automated it system failure prediction: A deep learning approach. In 2016 IEEE International Conference on Big Data (Big Data), pages 1291–1300. IEEE, 2016.
  13. One-class adversarial nets for fraud detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33:01, pages 1286–1293, 2019.
  14. Lognads: Network anomaly detection scheme based on semantic representation. Future Generation Computer Systems, 2021.
  15. Natural language processing: Speaker, language, and gender identification with lstm. In Advanced Computing and Systems for Security, pages 143–156. Springer, 2019.
  16. Operationally transparent cyber (optc). 2021.
  17. Hanna et al. Mazzawi. Anomaly detection in large databases using behavioral patterning. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pages 1140–1149. IEEE, 2017.
  18. Sk-tree: a systematic malware detection algorithm on streaming trees via the signature kernel. arXiv preprint arXiv:2102.07904, 2021.
  19. Alexander D Kent. Comprehensive, multi-source cyber-security events data set. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2015.
  20. You are what you do: Hunting stealthy malware via data provenance analysis. In NDSS, 2020.
  21. Treecluster: Clustering biological sequences using phylogenetic trees. PloS one, 14(8):e0221068, 2019.
  22. Mining sequential patterns. In Proceedings of the eleventh international conference on data engineering, pages 3–14. IEEE, 1995.
  23. Sequential pattern mining–approaches and algorithms. ACM Computing Surveys (CSUR), 45(2):1–39, 2013.
  24. Mining features for sequence classification. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 342–346, 1999.
  25. Scalable feature mining for sequential data. IEEE Intelligent Systems and their Applications, 15(2):48–56, 2000.
  26. A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter, 12(1):40–48, 2010.
  27. Tiresias: Predicting security events through deep learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 592–605, 2018.
  28. A hierarchical approach for advanced persistent threat detection with attention-based graph neural networks. Security and Communication Networks, 2021, 2021.
  29. Kernels for sequentially ordered data. Journal of Machine Learning Research, 20(31):1–45, 2019.

Summary

We haven't generated a summary for this paper yet.