LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge

Published 18 Jan 2024 in cs.CR, cs.AI, cs.IR, and cs.LO | (2401.10036v2)

Abstract: Security Operations Center (SoC) analysts gather threat reports from openly accessible global threat repositories and tailor the information to their organization's needs, such as developing threat intelligence and security policies. They also depend on organizational internal repositories, which act as private local knowledge database. These local knowledge databases store credible cyber intelligence, critical operational and infrastructure details. SoCs undertake a manual labor-intensive task of utilizing these global threat repositories and local knowledge databases to create both organization-specific threat intelligence and mitigation policies. Recently, LLMs have shown the capability to process diverse knowledge sources efficiently. We leverage this ability to automate this organization-specific threat intelligence generation. We present LocalIntel, a novel automated threat intelligence contextualization framework that retrieves zero-day vulnerability reports from the global threat repositories and uses its local knowledge database to determine implications and mitigation strategies to alert and assist the SoC analyst. LocalIntel comprises two key phases: knowledge retrieval and contextualization. Quantitative and qualitative assessment has shown effectiveness in generating up to 93% accurate organizational threat intelligence with 64% inter-rater agreement.

Abstract PDF HTML Upgrade to Chat

Authors (7)

References (24)

Citations (13)

View on Semantic Scholar

Summary

The paper presents LocalIntel, a novel framework that integrates global CTI and local knowledge to generate tailored threat intelligence, reducing manual SOC efforts.
It employs a retrieval-augmented generation approach using LLMs and vector databases, achieving a high RAGAS score of 0.9535.
By automating intelligence synthesis, LocalIntel enhances the accuracy and efficiency of cybersecurity threat assessments.

Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge

The paper "LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge" introduces a framework, LocalIntel, for automating the generation of contextualized threat intelligence by synthesizing information from both global and local cyber knowledge sources (2401.10036). The approach leverages retrieval-augmented generation to deliver organization-specific insights efficiently, reducing the manual effort typically required by Security Operations Center (SoC) analysts.

Introduction to LocalIntel

LocalIntel is designed to address a critical challenge faced by SoC analysts: the efficient integration of global cyber threat intelligence (CTI) with organization-specific contextual information to generate actionable threat responses. Traditional approaches require analysts to manually curate and contextualize information from expansive global databases and their internal resources, leading to high labor costs and potential errors. LocalIntel seeks to streamline this process using LLMs and a structured retrieval-augmented generation framework.

Figure 1: Overview of LocalIntel. The SoC analyst triggers an input Prompt $P$ to the system to retrieve information from global $G_i$ and local $L_i$ knowledge sources and then contextualizes the results, producing a final output Response Completion $C$ .

System Architecture

The architecture of LocalIntel is devised into three primary phases:

Global Knowledge Retrieval: This phase involves retrieving data from publicly accessible CTI repositories such as CVE, NVD, and CWE. The system, leveraging the ReAct framework, constructs queries to extract relevant global intelligence that aligns with the input prompts provided by the users.
Local Knowledge Retrieval: Once global intelligence is gathered, the system retrieves pertinent organizational information from local knowledge databases. This is achieved through a vector database that indexes local documents, allowing for fast semantic search and retrieval.
Contextualized Completion Generation: The retrieved global and local data are synthesized to produce a contextualized completion that is relevant to the specific organizational query. This final output facilitates immediate actionable insights for security analysts.
Figure 2: Architecture of our LocalIntel framework. It has three phases: global knowledge retrieval, local knowledge retrieval, and contextualized completion $C$ generation.

Implementation Insights

The implementation of LocalIntel involves modular interactions between components such as the LLMs, an Agent for controlling query flows, and a vector database for embedding and retrieving local knowledge. The choice of underlying technologies, such as the GPT-3.5 model for language understanding and the Chroma DB for vector storage, provides scalability and robustness necessary for operational environments.

The detailed pseudocode outlines the data flow within the system, highlighting how queries are formulated and executed to yield relevant knowledge that feeds into the LLM to generate the final comprehensive threat intelligence.

Quantitative and Qualitative Evaluation

LocalIntel’s efficacy is evaluated using the RAGAS framework, a metric specific for retrieval-augmented generation tasks, with the system achieving a significant score of 0.9535. This score underscores the framework's reliability in reconstructing prompt completions that closely match human-generated baseline references.

The qualitative assessment demonstrates the system's capability to stitch together fragmented global and local data into coherent and contextually relevant outputs. This is evidenced by illustrative examples where LocalIntel satisfactorily contextualizes generic threat information with local data nuances.

Figure 3: RAGAS evaluation score for Completion $C$ with respect to the human evaluator ground truth.

Conclusion

LocalIntel stands out as a sophisticated approach to automating the synthesis of threat intelligence through the integration of LLMs and structured retrieval mechanisms. Its ability to generate precise, contextually pertinent threat intelligence promises to enhance the efficacy and agility of SOC operations, allowing security professionals to prioritize policy implementation over manual data synthesis tasks.

In future work, expanding the system's capacity with more adaptive global and local knowledge integration techniques could further enhance its applicability across diverse cybersecurity domains, offering even greater resilience against evolving cyber threats.

Markdown Report Issue