Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Carbon Filter: Real-time Alert Triage Using Large Scale Clustering and Fast Search (2405.04691v1)

Published 7 May 2024 in cs.CR and cs.LG

Abstract: "Alert fatigue" is one of the biggest challenges faced by the Security Operations Center (SOC) today, with analysts spending more than half of their time reviewing false alerts. Endpoint detection products raise alerts by pattern matching on event telemetry against behavioral rules that describe potentially malicious behavior, but can suffer from high false positives that distract from actual attacks. While alert triage techniques based on data provenance may show promise, these techniques can take over a minute to inspect a single alert, while EDR customers may face tens of millions of alerts per day; the current reality is that these approaches aren't nearly scalable enough for production environments. We present Carbon Filter, a statistical learning based system that dramatically reduces the number of alerts analysts need to manually review. Our approach is based on the observation that false alert triggers can be efficiently identified and separated from suspicious behaviors by examining the process initiation context (e.g., the command line) that launched the responsible process. Through the use of fast-search algorithms for training and inference, our approach scales to millions of alerts per day. Through batching queries to the model, we observe a theoretical maximum throughput of 20 million alerts per hour. Based on the analysis of tens of million alerts from customer deployments, our solution resulted in a 6-fold improvement in the Signal-to-Noise ratio without compromising on alert triage performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Amazon kinesis. https://aws.amazon.com/kinesis/.
  2. Amazon s3. https://aws.amazon.com/s3/.
  3. ANNOY library. https://github.com/spotify/annoy.
  4. Apache airflow. https://airflow.apache.org/.
  5. Apache flink. https://flink.apache.org/what-is-flink/flink-architecture/.
  6. Cassandra. https://cassandra.apache.org/_/index.html/.
  7. Cloud infrastructure is not immune from the solarwinds orion breach. https://securityboulevard.com/2020/12/cloud-infrastructure-is-not-immune-from-the-solarwinds-orion-breach/.
  8. https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/docs/vmwcb-datasheet-edr.pdf.
  9. The future of machine learning in cybersecurity. https://www.cio.com/article/406441/the-future-of-machine-learning-in-cybersecurity.html/.
  10. Protocol buffers. https://protobuf.dev/.
  11. Reaching the tipping point of web application and api security. https://www.fastly.com/web-application-and-api-security-tipping-point/. Published July 2021. Research conducted by Enterprise Strategy Group, March 2021.
  12. Automated Incident Response: Respond to Every Alert. https://swimlane.com/blog/automated-incident-response-respond-every-alert/, 2019.
  13. New Research from Advanced Threat Analytics. https://prn.to/2uTiaK6, 2019.
  14. Towards a framework for measuring the performance of a security operations center analyst. In 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), pages 1–8. IEEE, 2020.
  15. {{\{{ATLAS}}\}}: A sequence-based learning approach for attack investigation. In 30th USENIX security symposium (USENIX security 21), pages 3005–3022, 2021.
  16. Cython: The best of both worlds. Computing in Science & Engineering, 13(2):31–39, 2011.
  17. Kairos:: Practical intrusion detection and investigation using whole-system provenance. arXiv preprint arXiv:2308.05034, 2023.
  18. Matteo Dell’Amico. Fishdbc: Flexible, incremental, scalable, hierarchical density-based clustering for arbitrary data and distance. arXiv preprint arXiv:1910.07283, 2019. https://arxiv.org/pdf/1910.07283.pdf.
  19. Efficient k-nearest neighbor graph construction for generic similarity measures. In Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar, editors, Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April 1, 2011, pages 577–586. ACM, 2011.
  20. Back-propagating system dependency impact for attack investigation. In USENIX Security Symposium, 2022.
  21. FireEye, Inc. How Many Alerts is Too Many to Handle? https://www2.fireeye.com/StopTheNoise-IDC-Numbers-Game-Special-Report.html, 2019.
  22. Github. Tlsh software. https://github.com/trendmicro/tlsh/.
  23. Grand View Research. Endpoint Detection And Response Market Size, Share & Trends Report. https://www.grandviewresearch.com/industry-analysis/endpoint-detection-response-market-report, 2023.
  24. Tactical Provenance Analysis for Endpoint Detection and Response Systems. In 41st IEEE Symposium on Security and Privacy (SP), Oakland’20, May 2020.
  25. Nodoze: Combatting threat alert fatigue with automated provenance triage. In Network and Distributed Systems Security (NDSS) Symposium 2019, pages 24–27, 2019.
  26. This is why we can’t cache nice things: Lightning-fast threat hunting using suspicion-based hierarchical storage. In ACSAC, 2020.
  27. Sok: History is a vast early warning system: Auditing the provenance of system intrusions. In 2023 IEEE Symposium on Security and Privacy (SP), pages 2620–2638. IEEE, 2023.
  28. Jesse Kornblum. Identifying almost identical files using context triggered piecewise hashing. Digital investigation, 3:91–97, 2006.
  29. Numba: A llvm-based python jit compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, pages 1–6, 2015.
  30. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, 42(4):824–836, 2018.
  31. Mary L. McHugh. Interrater reliability: the kappa statistic. Biochem Med (Zagreb), 22, 2012.
  32. Holmes: Real-time apt detection through correlation of suspicious information flows. In 40th IEEE Symposium on Security and Privacy, Oakland’19, Los Alamitos, CA, USA, may 2019. IEEE Computer Society.
  33. Holmes: Real-time apt detection through correlation of suspicious information flows. In 2019 2019 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 2019.
  34. Mitre. Mitre attack framework. https://attack.mitre.org/, 2013. [Online; accessed 25-Nov-2022].
  35. Mitre. Boot or logon initialization scripts: Rc scripts. https://attack.mitre.org/techniques/T1037/004/, 2020. [Online; accessed 30-Nov-2022].
  36. HAC-T and fast search for similarity in security. In 2020 International Conference on Omni-layer Intelligent Systems (COINS), pages 1–7. IEEE, 2020. https://tlsh.org/papersDir/COINS_2020_camera_ready.pdf.
  37. TLSH – a locality sensitive hash. In 2013 Fourth Cybercrime and Trustworthy Computing Workshop, pages 7–13, 2013. https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf.
  38. Living-off-the-land command detection using active learning. In 24th International Symposium on Research in Attacks, Intrusions and Defenses, pages 442–455, 2021.
  39. Anomaly detection algorithms for discrete choice problems and applications in behavioral risk scoring. Technical report, VMware, January 2020.
  40. Normalization: A preprocessing stage. CoRR, abs/1503.06462, 2015.
  41. Lempel-ziv jaccard distance, an effective alternative to ssdeep and sdhash. Digital Investigation, 24:34–49, 2018.
  42. Atlas v2: An open-source dataset for intrusion detection research. https://bitbucket.org/sts-lab/atlasv2/.
  43. Vassil Roussev. Data fingerprinting with similarity digests. In IFIP International Conference on Digital Forensics, pages 207–226. Springer, 2010.
  44. Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Transactions on Database Systems (TODS), 42(3):1–21, 2017.
  45. C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948.
  46. Building a machine learning model for the SOC, by the input from the SOC, and analyzing it for the SOC. In 2018 IEEE Symposium on Visualization for Cyber Security (VizSec), pages 1–8, 2018.
  47. Automatic application identification from billions of files. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2021–2030, 2017.
  48. Integrating ids alert correlation and os-level dependency tracking. In International Conference on Intelligence and Security Informatics. Springer, 2006.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jonathan Oliver (4 papers)
  2. Raghav Batta (3 papers)
  3. Adam Bates (7 papers)
  4. Muhammad Adil Inam (1 paper)
  5. Shelly Mehta (1 paper)
  6. Shugao Xia (1 paper)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com