Papers
Topics
Authors
Recent
Search
2000 character limit reached

LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge

Published 18 Jan 2024 in cs.CR, cs.AI, cs.IR, and cs.LO | (2401.10036v2)

Abstract: Security Operations Center (SoC) analysts gather threat reports from openly accessible global threat repositories and tailor the information to their organization's needs, such as developing threat intelligence and security policies. They also depend on organizational internal repositories, which act as private local knowledge database. These local knowledge databases store credible cyber intelligence, critical operational and infrastructure details. SoCs undertake a manual labor-intensive task of utilizing these global threat repositories and local knowledge databases to create both organization-specific threat intelligence and mitigation policies. Recently, LLMs have shown the capability to process diverse knowledge sources efficiently. We leverage this ability to automate this organization-specific threat intelligence generation. We present LocalIntel, a novel automated threat intelligence contextualization framework that retrieves zero-day vulnerability reports from the global threat repositories and uses its local knowledge database to determine implications and mitigation strategies to alert and assist the SoC analyst. LocalIntel comprises two key phases: knowledge retrieval and contextualization. Quantitative and qualitative assessment has shown effectiveness in generating up to 93% accurate organizational threat intelligence with 64% inter-rater agreement.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Leveraging BERT’s Power to Classify TTP from Unstructured Text. In 2022 Workshop on Communication Networks and Power Systems (WCNPS), 1–7. IEEE.
  2. CySecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain. arXiv preprint arXiv:2212.02974.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  4. A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109.
  5. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  6. RAGAS: Automated Evaluation of Retrieval Augmented Generation. arXiv preprint arXiv:2309.15217.
  7. LogQA: Question Answering in Unstructured Logs. arXiv preprint arXiv:2303.11715.
  8. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12): 1–38.
  9. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33: 9459–9474.
  10. TriCTI: an actionable cyber threat intelligence discovery system via trigger-enhanced neural network. Cybersecurity, 5(1): 8.
  11. Combating fake cyber threat intelligence using provenance in cybersecurity knowledge graphs. In 2021 IEEE International Conference on Big Data (Big Data), 3316–3323. IEEE.
  12. Cybertwitter: Using twitter to generate alerts for cybersecurity threats and vulnerabilities. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 860–867. IEEE.
  13. Impacts and Risk of Generative AI Technology on Cyber Defense. arXiv preprint arXiv:2306.13033.
  14. A natural language processing based trend analysis of advanced persistent threat techniques. In 2018 IEEE International Conference on Big Data (Big Data), 2995–3000. IEEE.
  15. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311–318.
  16. Language models as knowledge bases? arXiv preprint arXiv:1909.01066.
  17. Relext: Relation extraction using deep learning approaches for cybersecurity knowledge graph improvement. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 879–886.
  18. Creating cybersecurity knowledge graphs from malware after action reports. IEEE Access, 8: 211691–211703.
  19. A literature review on mining cyberthreat intelligence from unstructured texts. In 2020 International Conference on Data Mining Workshops (ICDMW), 516–525. IEEE.
  20. Cybert: Contextualized embeddings for the cybersecurity domain. In 2021 IEEE International Conference on Big Data (Big Data), 3334–3342. IEEE.
  21. ROUGE, L. C. 2004. A package for automatic evaluation of summaries. In Proceedings of Workshop on Text Summarization of ACL, Spain, volume 5.
  22. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  23. Attention is all you need. Advances in neural information processing systems, 30.
  24. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
Citations (13)

Summary

  • The paper presents LocalIntel, a novel framework that integrates global CTI and local knowledge to generate tailored threat intelligence, reducing manual SOC efforts.
  • It employs a retrieval-augmented generation approach using LLMs and vector databases, achieving a high RAGAS score of 0.9535.
  • By automating intelligence synthesis, LocalIntel enhances the accuracy and efficiency of cybersecurity threat assessments.

Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge

The paper "LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge" introduces a framework, LocalIntel, for automating the generation of contextualized threat intelligence by synthesizing information from both global and local cyber knowledge sources (2401.10036). The approach leverages retrieval-augmented generation to deliver organization-specific insights efficiently, reducing the manual effort typically required by Security Operations Center (SoC) analysts.

Introduction to LocalIntel

LocalIntel is designed to address a critical challenge faced by SoC analysts: the efficient integration of global cyber threat intelligence (CTI) with organization-specific contextual information to generate actionable threat responses. Traditional approaches require analysts to manually curate and contextualize information from expansive global databases and their internal resources, leading to high labor costs and potential errors. LocalIntel seeks to streamline this process using LLMs and a structured retrieval-augmented generation framework. Figure 1

Figure 1: Overview of LocalIntel. The SoC analyst triggers an input Prompt PP to the system to retrieve information from global GiG_i and local LiL_i knowledge sources and then contextualizes the results, producing a final output Response Completion CC.

System Architecture

The architecture of LocalIntel is devised into three primary phases:

  1. Global Knowledge Retrieval: This phase involves retrieving data from publicly accessible CTI repositories such as CVE, NVD, and CWE. The system, leveraging the ReAct framework, constructs queries to extract relevant global intelligence that aligns with the input prompts provided by the users.
  2. Local Knowledge Retrieval: Once global intelligence is gathered, the system retrieves pertinent organizational information from local knowledge databases. This is achieved through a vector database that indexes local documents, allowing for fast semantic search and retrieval.
  3. Contextualized Completion Generation: The retrieved global and local data are synthesized to produce a contextualized completion that is relevant to the specific organizational query. This final output facilitates immediate actionable insights for security analysts. Figure 2

    Figure 2: Architecture of our LocalIntel framework. It has three phases: global knowledge retrieval, local knowledge retrieval, and contextualized completion CC generation.

Implementation Insights

The implementation of LocalIntel involves modular interactions between components such as the LLMs, an Agent for controlling query flows, and a vector database for embedding and retrieving local knowledge. The choice of underlying technologies, such as the GPT-3.5 model for language understanding and the Chroma DB for vector storage, provides scalability and robustness necessary for operational environments.

The detailed pseudocode outlines the data flow within the system, highlighting how queries are formulated and executed to yield relevant knowledge that feeds into the LLM to generate the final comprehensive threat intelligence.

Quantitative and Qualitative Evaluation

LocalIntel’s efficacy is evaluated using the RAGAS framework, a metric specific for retrieval-augmented generation tasks, with the system achieving a significant score of 0.9535. This score underscores the framework's reliability in reconstructing prompt completions that closely match human-generated baseline references.

The qualitative assessment demonstrates the system's capability to stitch together fragmented global and local data into coherent and contextually relevant outputs. This is evidenced by illustrative examples where LocalIntel satisfactorily contextualizes generic threat information with local data nuances. Figure 3

Figure 3: RAGAS evaluation score for Completion CC with respect to the human evaluator ground truth.

Conclusion

LocalIntel stands out as a sophisticated approach to automating the synthesis of threat intelligence through the integration of LLMs and structured retrieval mechanisms. Its ability to generate precise, contextually pertinent threat intelligence promises to enhance the efficacy and agility of SOC operations, allowing security professionals to prioritize policy implementation over manual data synthesis tasks.

In future work, expanding the system's capacity with more adaptive global and local knowledge integration techniques could further enhance its applicability across diverse cybersecurity domains, offering even greater resilience against evolving cyber threats.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.