Industrial Security Knowledge Graph

Updated 21 December 2025

Industrial Security Knowledge Graph is a structured model that captures IT/OT assets, vulnerabilities, threats, and countermeasures for industrial environments.
Data is integrated from heterogeneous sources using LLM-based pipelines to ensure accurate entity alignment, relation extraction, and semantic integrity.
Quantitative risk scoring and graph analytics compute exploit probabilities and risk weights, enabling advanced multi-hop threat analysis and incident investigation.

An Industrial Security Knowledge Graph (KG) is a structured, machine-interpretable representation of cyber-physical security concepts—including assets, vulnerabilities, threats, countermeasures, and their relationships—tailored for the industrial domain. These knowledge graphs serve as an integrated semantic backbone for advanced threat analysis, compliance assurance, hazard reasoning, and multi-hop risk assessment in industrial contexts such as smart manufacturing, critical infrastructure, and process safety.

1. Core Ontology and Schema Design

An industrial security KG requires a domain-specific ontology that captures both operational technology (OT) and information technology (IT) entities, cyber-physical threat vectors, and multi-layered contextual relationships. For example, in BRIDG-ICS, the ontology includes nodes such as I5_Asset (e.g., PLCs, HMIs, IIoT gateways), Vulnerability (CVE-enriched, with CVSS/EPSS risk attributes), Weakness (CWE entries), AttackPattern (CAPEC definitions), and context-specific nodes (Protocols, Zones, Accounts, ProcessVariables) (Nandiya et al., 13 Dec 2025).

Relationship types span HAS_CVE (asset–vulnerability), HAS_POSSIBLE_CWE (vulnerability–weakness), EXPLOITED_BY_CAPEC (weakness–attack pattern), MAPPED_TO_TECHNIQUE (attack pattern–technique), as well as COMMUNICATES_WITH (asset–asset, denoting network/fieldbus ties), and operational links like covers, mitigates, exploited_by, has_impact. Key properties on both nodes (criticality, CVSS vector, description) and edges (riskWeight, pExploit, protocol) are captured to enable quantitative and qualitative security reasoning.

This semantic framework must be extensible to accommodate sector-specific control models (e.g., NIST 800-82, IEC 62443 controls), conditional logic (e.g., “if control X absent, risk increases”), and compliance exceptions (Park et al., 9 Dec 2025).

2. Data Integration and Triple Construction

Data for an industrial security KG is aggregated from heterogeneous sources: public vulnerability repositories (CVE, CWE, CAPEC), ICS product catalogs, testbed topology metadata (Purdue model), ICS-CERT/NIST advisories, and natural-language security standards or incident reports. Entity alignment procedures match vendor-specific asset descriptions to canonical categories and relational schemas. Structured tables and semi-structured text (including standards documents and HAZOP reports) are treated as knowledge units, each with a defined scope and context, and further decomposed into atomic propositions (Park et al., 9 Dec 2025, Wang et al., 2021).

LLM-based pipelines are increasingly adopted for entity and relation extraction. For example, SecureBERT and REBEL models tag entities and extract binary triples from threat descriptions, while prompt-based approaches (with few-shot security-domain examples) convert conditionals, numerical thresholds, and multi-clause policies into ontology-aligned subject–predicate–object triples (Nandiya et al., 13 Dec 2025, Park et al., 9 Dec 2025). Extraction steps include normalization (convert to SI units), synonym/variant consolidation (via embedding clustering), schema validation (type checks, relation pruning), and error correction.

Example triple types in industrial security KGs include:

Entity 1	Relation	Entity 2
PLC_A	HAS_CVE	CVE-2022-1234
CVE-2022-1234	HAS_POSSIBLE_CWE	CWE-79
Asset_X	COMMUNICATES_WITH	Asset_Y
Control_Failure	leads_to	Increased_Exposure

3. Probabilistic Risk Scoring and Graph Analytics

Industrial security KGs frequently embed quantitative risk metrics for advanced analytics. Edge properties are computed by fusing domain-specific risk and control models. In BRIDG-ICS, controlStrength is a product of accessibility, configuration hygiene, exploitability resistance, and residual hardening parameters:

$\text{controlStrength}(u,v) = a \cdot c \cdot e \cdot h$

Exploit probability for an edge is:

$p_\text{Exploit}(u,v) = \text{EPSS}(u,v) \cdot [1 - \text{controlStrength}(u,v)]$

Multi-edge exploit probabilities aggregate as:

$p_\text{Exploit,total} = 1 - \prod_{k} (1 - p_k)$

Residual risk weight is

$\text{riskWeight}(u,v) = p_\text{Exploit}(u,v) \times \frac{\text{criticality}(v)}{10}$

For attack-path analysis, graph algorithms—k-shortest paths (Yen), PageRank, Louvain community detection—are applied with risk-weighted edge costs to identify high-impact assets, propagate attack likelihoods, and inform segmentation or mitigation priorities (Nandiya et al., 13 Dec 2025). Multi-hop reasoning chains trace feasible attack paths or infer cross-control risk propagation.

4. Semantic Enrichment via LLMs

State-of-the-art KGs leverage LLMs (e.g., SecureBERT, GPT-5-mini) to enrich and complete graph semantics. LLMs provide:

High-accuracy entity recognition and relation extraction from technical domains (NER F1 ≈ 0.78 on held-out CTI; relation classification 66%–98% depending on relation type).
Automated NL2KG mapping: transforms narrative threat or policy statements (“If MFA is not used then risk=high”) into ontology triples with has_condition or has_consequence relations (Nandiya et al., 13 Dec 2025, Park et al., 9 Dec 2025).
Synonym and paraphrase consolidation to address heterogeneity in naming and phrasing.
Inference of latent or unreported relations via link prediction (FastRP/k-NN) or classifier-based scoring (e.g., HAS_POSSIBLE_TECHNIQUE).

Post-processing includes SI normalization, synonym dictionary application, schema validation, and duplicate pruning (prefer higher-confidence triples). This workflow yields an enriched, logically coherent KG suitable for downstream analytics and QA tasks.

5. Retrieval-Augmented Reasoning and QA on KG

Industrial security KGs enable retrieval-augmented generation (KG-RAG) for question answering, compliance auditing, and incident investigation. Ontology-aware retrieval methods select context blocks or traverse k-hop neighborhoods to aggregate relevant triples. For example, KG-level retrieval embeddings are combined with semantic filtering and prompt-augmented LLM reasoning to generate natural-language responses or reasoning traces. Optimal multi-hop depths vary (e.g., 3-hop for MindMap, 2-hop for KG-Retriever) depending on the structural density and user query (Park et al., 9 Dec 2025).

Benchmarked on standard QA datasets (IndusSpec-QA: rule, table, multi-hop; toxic-clause detection), ontology-aware KG retrievers demonstrate significant F1 improvements (e.g., 0.454 vs. 0.277 baseline, or ∼64% relative gain) and high recall in critical tasks (e.g., F1=0.910, recall=0.926 for toxicity detection) (Park et al., 9 Dec 2025).

6. Industrial Safety KG Approaches: Process Hazard Contexts

In process safety and hazard analysis, KGs are constructed from HAZOP reports using a standardized five-slot template: Cause (IC), Deviation (D), Middle-event (ME), Consequence (C), and Suggestion (S). The ISK Standardization Framework (ISKSF) provides a layered ontology, deep learning-based extraction pipeline (IBERT+BiLSTM+CRF+industrial loss), and standardized triple construction for hazard event chains (Wang et al., 2021). A uniform RISK propagation relation encodes causality:

$\text{RISK}(\zeta, \eta) = IC(\zeta, \eta) \to D(\zeta, \eta) \to ME(\zeta, \eta) \to C(\zeta, \eta) \to S(\zeta, \eta)$

Demonstrated case studies show that ISKGs support interactive hazard chain exploration, fault analysis, propagation reasoning, and retrieval of recommendations given partial incident information. A plausible implication is that the process-agnostic ISKSF model can be rapidly adapted to other safety-critical industrial domains, including FMEA or nuclear process reporting.

7. Applications, Scalability, and Adaptation to New Security Domains

Industrial security KGs support multi-modal applications: cyber attack simulation, hazard propagation, resilience dashboards, explainable reasoning, and toxic control detection. High scalability (10K–50K node graphs, millisecond query latency) is achievable via graph databases (Neo4j) and graph analytics frameworks (GDS) (Nandiya et al., 13 Dec 2025). LLM throughput and incremental ingestion enable near-real-time updates for rapidly evolving security knowledge bases.

Adapting methodologies for security standards entails:

Extending hierarchical schemas to model security-specific document structures (Annexes, Controls, Domains).
Enriching ontologies with Threat, Vulnerability, Control, RiskScenario, AttackPattern, and relevant relations.
Adopting atomic proposition extraction for numeric (CVSS), Boolean, and temporal constraints.
Specializing extraction prompts for domain-specific asset–threat–control mapping.
Constructing QA and compliance testing benchmarks to evaluate the operational efficacy of the resulting security KGs (Park et al., 9 Dec 2025).

Representative case studies demonstrate exposure reduction (30–80%) and increased average attack path length (25–50%) following application of best-practice controls (Nandiya et al., 13 Dec 2025). The integration of semantic and quantitative reasoning underpins both compliance and adaptive defense in increasingly heterogeneous industrial environments.