AttacKG+: Cyber Threat Knowledge Graph
- AttacKG+ is a framework that constructs structured, MITRE-aligned attack knowledge graphs from raw cyber threat intelligence using LLMs.
- It employs a four-stage pipeline—rewriter, parser, identifier, and summarizer—to generate multi-layered, temporally ordered representations of cyber attacks.
- The system enhances CTI analysis with superior extraction performance, enabling precise attack reconstruction and integrated threat event analysis.
AttacKG+ refers to a family of frameworks and associated methodologies for constructing structured attack knowledge graphs from unstructured cyber threat intelligence (CTI) reports, leveraging recent advances in LLMs to produce richly annotated, temporally coherent, and multi-layered representations of cyber attack sequences. Notably, AttacKG+ builds upon prior knowledge-graph extraction systems (such as AttacKG) and is distinguished by its comprehensive pipeline architecture, its multi-level schema integrating MITRE TTP labels, and its exclusive reliance on zero/few-shot LLM capabilities for information extraction, tacticalization, and threat event summarization (Zhang et al., 8 May 2024). This approach is central for modern threat event analysis, attack reconstruction, and security automation, offering significant improvements in extraction performance and analytic usability over previous state-of-the-art systems.
1. Framework Structure and Workflow
AttacKG+ operationalizes the transformation of CTI prose into structured knowledge through a four-stage pipeline, each implemented with LLMs via prompt engineering and in-context learning:
- Rewriter: Cleans and partitions the CTI report, removing redundancies and segmenting content according to predefined tactical categories (e.g., MITRE’s 14 tactics), thereby producing a chronologically sorted, tactic-aligned narrative.
- Parser: Extracts “atomic events” as (subject, action, object) triplets , resolves intra- and inter-sentence dependencies, and constructs a directed behavioral event graph, with nodes and edges reflecting actor-action-object semantics and temporal precedence.
- Identifier: Aligns parsed events with canonical technique templates (e.g., MITRE ATT&CK), assigning the appropriate MITRE TTP tactic and technique codes to entities and subgraphs; leverages LLM-based similarity matching and multi-example alignment for robust technique annotation.
- Summarizer: Generates a contextual summary for each tactical segment, aggregating environmental states, permission levels, file collections, toolsets, and actor activities at each time slice.
The result is a three-layered attack knowledge graph, in which the central layer is the behavior event graph, the upper layer consists of tactic/technique tags, and the lower layer provides a state summary per step. This layered schema enables both fine-grained behavioral analysis and high-level campaign profiling.
2. LLM-Centric Information Extraction
All core modules in AttacKG+ utilize LLMs as their principal processing engine. Rather than depending on bespoke, dataset-specific model training or handcrafted rules, AttacKG+ orchestrates LLM behaviors by:
- Feeding tactical definitions, schema rules, and canonical examples via prompt templates for rewriting and tactical segmentation.
- Prompting the model to extract structured triplets and relations, including complex coreferences and indirect dependencies, by presenting task-specific instructions and in-context exemplars.
- Matching parsed structures to known MITRE techniques and tactics by supplying the model with labeled technique descriptions and expectable behavioral patterns (e.g., T1195 for supply chain compromise, T1059 for script interpreter execution).
- Instructing concise, context-sensitive state summarization for each attack phase, ensuring the system produces narrative summaries aligned with key analytic requirements.
This LLM-driven paradigm offers enhanced generalization, minimal requirement for user-side model design or fine-tuning, and does not necessitate deep NLP/ML expertise from security practitioners.
3. Schema Innovations and Temporal Representation
AttacKG+ advances the attack knowledge graph schema along several dimensions relative to previous efforts:
- Temporally Unfolding Events: Treats the attack as a progressive series of events, explicitly capturing the temporal and causal structure via directed edges among atomic event nodes.
- Three-Layered Representation: Integrates (i) atomic behavioral events, (ii) MITRE TTP/tactic labels aligned to events/subgraphs, and (iii) contextual state summaries for analytic traceability and environmental awareness.
- Multi-Level Threat Knowledge: Encodes both fine-grained (IoC, tool usage, privilege transition) and campaign-level (tactic, technique, system state) information, supporting detailed forensic reconstruction and broad campaign analytics.
Atomic events are formally represented as , with additional relationships and temporal ordering captured through edge annotations. Cross-report entity and action alignment is also supported, allowing aggregation across multiple CTI sources.
4. Empirical Performance and Evaluation
Extensive evaluation of AttacKG+ demonstrates robust extraction and annotation performance:
- On manually labeled CTI datasets (N=15), AttacKG+ achieves superior F1 scores in both entity/relation extraction and technique identification, nearly doubling the technique-level F1 relative to prior methods (e.g., EXTRACTOR).
- AttacKG+ reports lower false-negative rates, more accurate event segmentation, and more precise technique alignment in extracting threat graphs from unstructured CTI text.
- When scaling to hundreds of reports, AttacKG+ reliably extracts distributions of MITRE tactics/techniques, showing clear utility in downstream analytic tasks such as attack reconstruction and event profiling.
Associated tables report precision, recall, and F1 statistics for entity, relation, and technique identification tasks; prompt templates are provided to ensure reproducibility of the extraction pipeline.
5. Practical Utility and Security Applications
AttacKG+ offers direct benefits in several cybersecurity practice areas:
- Threat Event Analysis: The layered knowledge graph facilitates rapid tracing of multi-stage attacks, supports detection of technique variants across campaigns, and enables situational awareness throughout incident response.
- Attack Reconstruction: The temporally indexed event graph allows forensic analysts to reconstruct complete kill-chain scenarios, including actor identities, exploited techniques, affected artifacts, and system/environment transitions.
- Operational Integration: The pipeline’s reliance on LLM-based instructions allows domain experts to utilize the system without further NLP model engineering, broadening adoption in SOC and threat intelligence teams.
In a case paper (e.g., C5 APT SKHack incident), AttacKG+ allowed precise recovery of employed MITRE techniques and provided inferences regarding system environment and adversary capability based on event structure.
6. Technical Specifications and Underlying Ontology
AttacKG+ adheres to standard cybersecurity ontologies, leveraging the MITRE ATT&CK technique taxonomy, STIX object specifications, and DAO representations for entity and relation types. Output formalism includes:
- Explicit atomic event syntax:
- Table-form precision, recall, and F1 metrics for all extraction layers
- Prompt architectures for rewriting, triplet extraction, technique identification, and state summarization (see appendix in (Zhang et al., 8 May 2024))
Extraction modules handle challenges such as noise, variable sentence structure, non-standard terminology, and entity coreference using model-driven context handling and explicit prompt design.
7. Limitations and Research Outlook
While AttacKG+ advances CTI structuring automation, the approach has recognized limitations:
- It depends on the underlying capabilities and prompt-responsiveness of LLMs, which may produce errors with ambiguous or poorly formatted CTI text.
- Fine distinctions among sub-techniques or context-dependent tactics may remain challenging where CTI authors omit or conflate steps.
- Template and threshold tuning for technique matching may need further empirical adjustment as the corpus and adversary TTPs evolve.
Ongoing research targets enhancement of fuzzy graph alignment, improved context aggregation, and the integration of image or multi-modal CTI data (as further developed in subsequent multimodal frameworks (Zhang et al., 20 Jun 2025)). Further, tighter coupling with real-time SOC platforms and dynamic threat intelligence aggregation is a central avenue for future development.
AttacKG+ represents a principled, LLM-powered evolution in cyber threat intelligence processing, providing a rigorously layered, temporally ordered, and TTP-annotated attack knowledge graph directly from raw CTI reports. Its architecture, empirical superiority, and practical design are poised to augment both security analysis workflows and the broader research effort in cyber threat knowledge engineering (Zhang et al., 8 May 2024).