AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models (2405.04753v1)

Published 8 May 2024 in cs.CR and cs.AI

Abstract: Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize LLMs, which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfiLLMent. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.

PDF Abstract

Enhancing Cyberattack Knowledge Graphs with LLMs

Introduction to AttacKG+

In the dynamic world of cybersecurity, understanding and structuring cyber threats is crucial. The paper introduces AttacKG+, a novel framework designed to structure cyber threat intelligence (CTI) into attack knowledge graphs using LLMs. AttacKG+ aims to improve upon previous methodologies by addressing key challenges and boosting the accuracy and automation of knowledge graph construction.

Key Challenges in Existing Methods

Existing methods for creating attack knowledge graphs face substantial hurdles that AttacKG+ seeks to overcome:

Generalization Issues: Traditional models struggle to adapt to varied and emerging attack scenarios, often due to limited training data and model sizes.
Dependency on Expertise: Many current approaches rely heavily on expert knowledge and manual tuning, which can be resource-intensive and restrict wide-scale use among cybersecurity practitioners.

How AttacKG+ Works

AttacKG+ introduces a fully automated, four-module construction framework powered by LLMs. Each module — rewriter, parser, identifier, and summarizer — is crafted to tackle specific aspects of attack knowledge graph construction. Here’s a closer look:

Rewriter: This module organizes CTI reports into clear, tactical sections, removing irrelevant information and setting the stage for detailed analysis.
Parser: Following rewriting, this module extracts the core behaviors and relationships from the structured text, building out the behavior graph part of the knowledge schema.
Identifier: This critical component labels parts of the behavior graph with specific MITRE techniques, enriching the graph with valuable technical context.
Summarizer: The final module provides a summary of the state at the end of each tactical stage, capturing changes in system states, tool usage, and other dynamic elements.

Empirical Validation and Results

The validation of AttacKG+ involved comparing its outputs to existing solutions like EXTRACTOR and prior versions of AttacKG. The results significantly favor AttacKG+:

Improved Extraction: AttacKG+ demonstrated higher precision and recall in extracting entities, relations, and techniques compared to its predecessors.
Comprehensive Testing: Tested on a diverse set of 500 CTI reports, AttacKG+ was adept at identifying a wide array of tactics, techniques, and entities, showcasing its robustness.

Practical Implications and Future Directions

The advancements presented by AttacKG+ are not just academic; they bear significant practical implications:

Accessibility for Practitioners: Reducing reliance on deep technical expertise democratizes advanced CTI analysis, allowing more organizations to protect themselves effectively.
Enhanced Response to Cyber Threats: By providing a more nuanced and automated analysis of CTI, AttacKG+ enables faster and more accurate threat responses.

Going forward, the integration of multimodal data and refinement of LLMs' understanding of user-specific requirements could further enhance the performance and utility of attack knowledge graph construction frameworks like AttacKG+.

Conclusion

AttacKG+, through its innovative use of LLMs and a well-structured framework, sets a new standard in the automated construction of attack knowledge graphs. By addressing the twin challenges of model generalization and the need for expert knowledge, it offers a promising path toward more sophisticated and accessible cyber threat intelligence analysis.