Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Marlin: Knowledge-Driven Analysis of Provenance Graphs for Efficient and Robust Detection of Cyber Attacks (2403.12541v2)

Published 19 Mar 2024 in cs.CR

Abstract: Recent research in both academia and industry has validated the effectiveness of provenance graph-based detection for advanced cyber attack detection and investigation. However, analyzing large-scale provenance graphs often results in substantial overhead. To improve performance, existing detection systems implement various optimization strategies. Yet, as several recent studies suggest, these strategies could lose necessary context information and be vulnerable to evasions. Designing a detection system that is efficient and robust against adversarial attacks is an open problem. We introduce Marlin, which approaches cyber attack detection through real-time provenance graph alignment.By leveraging query graphs embedded with attack knowledge, Marlin can efficiently identify entities and events within provenance graphs, embedding targeted analysis and significantly narrowing the search space. Moreover, we incorporate our graph alignment algorithm into a tag propagation-based schema to eliminate the need for storing and reprocessing raw logs. This design significantly reduces in-memory storage requirements and minimizes data processing overhead. As a result, it enables real-time graph alignment while preserving essential context information, thereby enhancing the robustness of cyber attack detection. Moreover, Marlin allows analysts to customize attack query graphs flexibly to detect extended attacks and provide interpretable detection results. We conduct experimental evaluations on two large-scale public datasets containing 257.42 GB of logs and 12 query graphs of varying sizes, covering multiple attack techniques and scenarios. The results show that Marlin can process 137K events per second while accurately identifying 120 subgraphs with 31 confirmed attacks, along with only 1 false positive, demonstrating its efficiency and accuracy in handling massive data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. 2023. Apache Flink, Stateful Computations over Data Stream. https://flink.apache.org/.
  2. 2023. APACHE KAFKA. https://kafka.apache.org/.
  3. 2023. Cloud Endpoint Protection - Stop Threats at a Faster Speed. https://www.sentinelone.com/beyond_endpoint/xdr_protection.
  4. 2023. eBPF. https://ebpf.io/.
  5. 2023. Event Tracing for Windows (ETW). https://learn.microsoft.com/en-us/windows-hardware/drivers/devtest/event-tracing-for-windows--etw-.
  6. 2023. Microsoft Sentinel. https://www.microsoft.com/en-us/security/business/siem-and-xdr/microsoft-sentinel.
  7. 2023. PKU ASAL Dataset. https://github.com/PKU-ASAL/Simulated-Data.
  8. 2023. Transparent-Computing. https://github.com/darpa-i2o/Transparent-Computing.
  9. Malware Dynamic Analysis Evasion Techniques: A Survey. ACM Comput. Surv. 52, 6, Article 126 (nov 2019), 28 pages. https://doi.org/10.1145/3365001
  10. 99% False Positives: A Qualitative Study of SOC Analysts’ Perspectives on Security Alarms. In 31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 2783–2800. https://www.usenix.org/conference/usenixsecurity22/presentation/alahmadi
  11. ATLAS: A Sequence-based Learning Approach for Attack Investigation.. In USENIX Security Symposium. 3005–3022.
  12. ProvG-Searcher: A Graph Representation Learning Approach for Efficient Provenance Graph Search. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 2247–2261.
  13. A static malware detection system using data mining methods. arXiv preprint arXiv:1308.2831 (2013).
  14. Sketch-based anomaly detection in streaming graphs. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 93–104.
  15. Red Canary. 2023. Explore Atomic Red Team. https://atomicredteam.io/
  16. SplitScreen: Enabling efficient, distributed malware detection. Journal of Communications and Networks 13, 2 (2011), 187–200.
  17. Graph representation learning: a survey. APSIPA Transactions on Signal and Information Processing 9 (2020), e15.
  18. {{\{{CLARION}}\}}: Sound and clear provenance tracking for microservice deployments. In 30th USENIX Security Symposium (USENIX Security 21). 3989–4006.
  19. Senator Commerce Committee. 2014. A “Kill Chain” Analysis of the 2013 Target Data Breach. https://www.commerce.senate.gov/services/files/24d3c229-4f2f-405d-b8db-a3a67f183883
  20. ALASTOR: Reconstructing the Provenance of Serverless Intrusions. In 31st USENIX Security Symposium (USENIX Security 22). 2443–2460.
  21. S⁢A⁢Q⁢L𝑆𝐴𝑄𝐿SAQLitalic_S italic_A italic_Q italic_L: A Stream-based Query System for {{\{{Real-Time}}\}} Abnormal System Behavior Detection. In 27th USENIX Security Symposium (USENIX Security 18). 639–656.
  22. A⁢I⁢Q⁢L𝐴𝐼𝑄𝐿AIQLitalic_A italic_I italic_Q italic_L: Enabling efficient attack investigation from system monitoring data. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 113–126.
  23. Unicorn: Runtime Provenance-Based Detector for Advanced Persistent Threats. In 27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. The Internet Society. https://www.ndss-symposium.org/ndss-paper/unicorn-runtime-provenance-based-detector-for-advanced-persistent-threats/
  24. Attack dynamics: an automatic attack graph generation framework based on system topology, CAPEC, CWE, and CVE databases. Computers & Security 123 (2022), 102938.
  25. NoDoze: Combatting Threat Alert Fatigue with Automated Provenance Triage. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society. https://www.ndss-symposium.org/ndss-paper/nodoze-combatting-threat-alert-fatigue-with-automated-provenance-triage/
  26. Omegalog: High-fidelity attack investigation via transparent multi-layer log analysis. In Network and distributed system security symposium.
  27. REGAL: Representation Learning-Based Graph Alignment. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 117–126. https://doi.org/10.1145/3269206.3271788
  28. S⁢L⁢E⁢U⁢T⁢H𝑆𝐿𝐸𝑈𝑇𝐻SLEUTHitalic_S italic_L italic_E italic_U italic_T italic_H: Real-time attack scenario reconstruction from C⁢O⁢T⁢S𝐶𝑂𝑇𝑆COTSitalic_C italic_O italic_T italic_S audit data. In 26th USENIX Security Symposium (USENIX Security 17). 487–504.
  29. Combating Dependence Explosion in Forensic Analysis Using Alternative Tag Propagation Semantics. In IEEE Symposium on Security and Privacy (SP).
  30. Combating dependence explosion in forensic analysis using alternative tag propagation semantics. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 1139–1155.
  31. Attack Hypotheses Generation Based on Threat Intelligence Knowledge Graph. IEEE Transactions on Dependable and Secure Computing (2023).
  32. NeMa: Fast Graph Search with Label Similarity. Proc. VLDB Endow. 6, 3 (jan 2013), 181–192. https://doi.org/10.14778/2535569.2448952
  33. EdgeTorrent: Real-time Temporal Graph Representations for Intrusion Detection. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses. 77–91.
  34. Matched and Mismatched SOCs: A Qualitative Study on Security Operations Center Issues. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (London, United Kingdom) (CCS ’19). Association for Computing Machinery, New York, NY, USA, 1955–1970. https://doi.org/10.1145/3319535.3354239
  35. Matched and mismatched SOCs: A qualitative study on security operations center issues. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. 1955–1970.
  36. NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation. arXiv preprint arXiv:2311.02331 (2023).
  37. Effective and light-weight deobfuscation and semantic-aware attack detection for powershell scripts. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1831–1847.
  38. Threat detection and investigation with system-level provenance graphs: a survey. Computers & Security 106 (2021), 102282.
  39. AttacKG: Constructing technique knowledge graph from cyber threat intelligence reports. In European Symposium on Research in Computer Security. Springer, 589–609.
  40. Lorenzo Livi and Antonello Rizzi. 2013. The graph matching problem. Pattern Analysis and Applications 16 (2013), 253–283.
  41. Aashima Malhotra and Karan Bajaj. 2016. A hybrid pattern based text mining approach for malware detection using DBScan. CSI transactions on ICT 4 (2016), 141–149.
  42. Fast Memory-Efficient Anomaly Detection in Streaming Heterogeneous Graphs. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1035–1044. https://doi.org/10.1145/2939672.2939783
  43. Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. 1795–1812.
  44. HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows. In IEEE Symposium on Security and Privacy (SP). https://doi.org/10.1109/SP.2019.00026
  45. Efficient attack graph analysis through approximate inference. ACM Transactions on Privacy and Security (TOPS) 20, 3 (2017), 1–30.
  46. Dynamic Malware Analysis in the Modern Era—A State of the Art Survey. ACM Comput. Surv. 52, 5, Article 88 (sep 2019), 48 pages. https://doi.org/10.1145/3329786
  47. MAGE: Matching approximate patterns in richly-attributed graphs. In 2014 IEEE International Conference on Big Data (Big Data). 585–590. https://doi.org/10.1109/BigData.2014.7004278
  48. Interpretable neural subgraph matching for graph retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8115–8123.
  49. SentinelOne. 2023. High-impact attacks on critical infrastructure climb 140%. https://securityintelligence.com/news/high-impact-attacks-on-critical-infrastructure-climb-140/.
  50. A human capital model for mitigating security analyst burnout. In Eleventh Symposium On Usable Privacy and Security (SOUPS 2015). 347–359.
  51. Fast Best-Effort Pattern Matching in Large Attributed Graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Jose, California, USA) (KDD ’07). Association for Computing Machinery, New York, NY, USA, 737–746. https://doi.org/10.1145/1281192.1281271
  52. You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis. https://doi.org/10.14722/ndss.2020.24167
  53. Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning. IEEE Transactions on Information Forensics and Security 17 (2022), 3972–3987.
  54. Conan: A Practical Real-Time APT Detection System With High Accuracy and Efficiency. IEEE Transactions on Dependable and Secure Computing 19, 1 (2022), 551–565. https://doi.org/10.1109/TDSC.2020.2971484
  55. Dynamic Knowledge Graph Alignment. Proceedings of the AAAI Conference on Artificial Intelligence 35, 5 (May 2021), 4564–4572. https://doi.org/10.1609/aaai.v35i5.16585
  56. Histosketch: Fast similarity-preserving sketching of streaming histograms with concept drift. In 2017 IEEE International Conference on Data Mining (ICDM). IEEE, 545–554.
  57. Ratscope: Recording and reconstructing missing rat semantic behaviors for forensic analysis on windows. IEEE Transactions on Dependable and Secure Computing 19, 3 (2020), 1621–1638.
  58. Shadewatcher: Recommendation-guided cyber threat analysis using system audit records. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 489–506.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com