Procedural Knowledge Graph Extraction

Updated 8 January 2026

Procedural knowledge graph extraction converts process texts into structured graphs that encode steps, actions, and dependencies.
Extraction methods combine linguistic parsing, sequence labeling, entity linking, and LLM-guided graph assembly to capture control-flow and conditional logic.
Applications span automated QA, process mining, compliance reasoning, and dynamic querying across industrial, scientific, and creative fields.

Procedural knowledge graph extraction is the process of converting process-oriented or instructional text (recipes, maintenance manuals, scientific procedures, technical support documents) into explicit, structured graph representations that encode steps, actions, entities, control-flow, and dependencies. These procedural knowledge graphs (KGs) support process automation, search, compliance reasoning, and dynamic querying in high-stakes industrial, scientific, and creative domains. Extraction pipelines rely on a combination of linguistic parsing, sequence labeling, entity linking, graph assembly algorithms, and, increasingly, LLMs and neuro-symbolic verifiers.

1. Formal Definitions and Modeling Schemes

Procedural knowledge graph extraction formalizes document conversion as a function $f : D \to G$ , where $D$ is a process-oriented text and $G$ is a directed labeled graph (Du et al., 2024). Graph schemas vary by application, but core abstractions universally include:

Nodes: steps, actions, entities/objects, conditions, actors/agents, temporal markers, tools (Carriero et al., 26 Mar 2025, Ai et al., 7 Oct 2025).
Edges: relations for control-flow (e.g., FOLLOWS, followedBy, SequenceFlow), causality (HAS_CAUSE/Effect), conditional branching, part-whole (PART_OF), and semantic roles (involves, usedBy).
Types/Ontologies: domain-specific class hierarchies (e.g., pko:Step, pko:Action, ConditionBlock, FailureMode, Event, Tool, StepExecution) (Carriero et al., 26 Mar 2025, Ai et al., 7 Oct 2025, Kumar et al., 14 Apr 2025).

The extraction task involves identifying node textual spans, assigning node types, extracting relations, and forming the correct graph topology (including sequencing, conditional branching, and exception arcs) (Du et al., 2024, Mysore et al., 2017, Kumar et al., 14 Apr 2025). Procedural KGs may capture both specification and execution (steps performed, agents involved, issues, and duration) (Carriero et al., 26 Mar 2025).

Graph Element	Node Types	Edge Types
Industrial PK	Procedure, Step, Action, Tool, Agent	hasStep, nextStep, requiresTool
Aviation KG	Component, Event, FailureMode, Action	FOLLOWED_BY, HAS_CAUSE, LOCATION
BPMN Graph	Actor, Action, Gateway, Constraint	SequenceFlow, ConditionFlow, ConstraintFlow
Support MicroKG	Procedure, Step, ConditionBlock, Effect	hasStep, followedBy, hasCondition

2. Extraction Pipelines and Algorithms

Extraction methodologies span rule-based, supervised neural, unsupervised generative, and LLM-centric approaches:

Preprocessing: sentence segmentation; tokenization; Named Entity Recognition (NER) to identify candidate mentions (Ai et al., 7 Oct 2025, Mysore et al., 2017).
Entity & Step Extraction: supervised sequence-taggers such as Bi-LSTM-CRF, DCNN, and fastText (Mysore et al., 2017, Yang et al., 2019); spaCy, ChemDataExtractor, custom parsers (Ai et al., 7 Oct 2025).
Relation Extraction: iterative LLM prompting for triplets; rule-based templates for flow patterns; classifier architectures for relation types (Carriero et al., 26 Mar 2025, Du et al., 2024, Ai et al., 7 Oct 2025).
Event Segmentation: parsing dependency trees and splitting on conjunct roots (Mysore et al., 2017).
Edge Induction: sequential heuristics (link each intermediate to prior operation); generative models (learn reference distributions over prior steps); System-2 self-refine verifiers for logical branching, constraint assignment (Mysore et al., 2017, Du et al., 2024).
Graph Construction: triple extraction and incremental assembly; m-hop expansion and spanning tree filtering for context selection in QA (Ai et al., 7 Oct 2025); JSON→property graph conversion for micrographs (Kumar et al., 14 Apr 2025).

Procedural instantiations often encode multi-step sequences as $\langle Step_i, FOLLOWED\_BY, Step_{i+1} \rangle$ and model conditionality through ConditionBlock and Effect nodes (Kumar et al., 14 Apr 2025, Ai et al., 7 Oct 2025).

3. Ontologies, Schemas, and Design Patterns

Domain ontologies provide the semantic backbone for procedural KGs:

PKO (Procedural Knowledge Ontology): core classes (Procedure, Step, Action, Tool, Agent, ProcedureExecution, StepExecution, IssueOccurrence, UserQuestionOccurrence, MultiStep) (Carriero et al., 26 Mar 2025); extends PROV-O, P-Plan, DCAT/Resource.
BPMN-style schemas: node categories for Actor, Action, Gateway (XOR, OR, AND), Constraints, Start/End (Du et al., 2024).
Aviation KG: entity classes for Components, FailureModes, Events, Actions, Location, TimePeriod; relation set $R$ includes OWNED_BY, HAS_CAUSE, FOLLOWS, LOCATION, PART_OF (Ai et al., 7 Oct 2025).
Micrograph schemas: granular node and edge types for capturing section structure, steps, conditional blocks, effects, and page metadata, supporting full document context (Kumar et al., 14 Apr 2025).

Properties establish sequencing (pko:nextStep), versioning, control-flow, exception handling (IssueOccurrence, addressesIssueWith), verification, and resource linkage (Carriero et al., 26 Mar 2025). Extraction pipelines map verbs to actions/steps, nouns to entities/tools/resources, clause-boundaries to conditionals, and enumerate via list-based heuristics (Carriero et al., 26 Mar 2025, Kumar et al., 14 Apr 2025).

4. Benchmarks, Evaluation Metrics, and Experimental Results

Large-scale procedural graph extraction is evaluated on benchmarks such as PAGED (3,394 business process documents), OMIn (aviation maintenance), annotated web tutorial corpora, scientific software, and materials science synthesis procedures (Du et al., 2024, Ai et al., 7 Oct 2025, Yang et al., 2019, Haris et al., 2023, Mysore et al., 2017).

Metrics: BLEU-based soft F1 for text-span match (Actor, Action, Constraint) (Du et al., 2024); standard F1 for gateways and flows; ROUGE-L for procedure selection (Ai et al., 7 Oct 2025); micro-average accuracy for state prediction (Das et al., 2018).
Findings:
- Sequence-tagging models achieve ∼77.6% F1 (entity extraction, synthesis domain) (Mysore et al., 2017).
- Rule-based and pipeline baselines perform poorly on flow assembly (F1 < 0.2 for structure) (Du et al., 2024).
- LLMs outperform baselines on text-based element detection (up to 0.78 F1 for data constraints), but all models remain below 0.6 F1 on non-sequential logic—gateway, parallel, conditional flow (Du et al., 2024).
- Sequential heuristics dominate reference-linking for strictly sequential domains (e.g., inorganic synthesis procedures, action graphs) (Mysore et al., 2017).
- KG-augmented RAG pipelines support global sensemaking, but text-chunk RAG slightly outperforms for fine-grained procedural QA (Ai et al., 7 Oct 2025).

Model	Action F1	Flow F1	Gateway F1
Rule-based	0.308	0.056	0.485
Sequence-tagging	0.744	0.478	0.554
LLM (FT)	0.744	0.478	0.554
Self-Refine LLM	+0.14 (OR)		+0.02 (XOR)

5. Reasoning, Querying, and Practical Applications

Procedural KGs power a spectrum of applications:

Automated QA: seed-node retrieval, m-hop expansion, and graph-to-text reconstruction enable high-precision context feeding for LLM answer generation (Ai et al., 7 Oct 2025); example: query “What caused the engine to quit?” yields explicit traversal through HAS_CAUSE and TIME_PERIOD edges.
Support/Helpdesk Automation: micrograph schema enables granular chatbot questioning, conditional step execution, and disambiguation by constraint or OS section (Kumar et al., 14 Apr 2025).
Process Mining and Compliance: PKO-based graphs supply explicit modifiable process models, execution/event logs, tool reference mapping, and exception handling (Carriero et al., 26 Mar 2025).
Dynamic Process Tracking: stateful procedural KGs (KG-MRC) maintain evolving entity-location relations through soft co-reference and neural graph updates, supporting procedural comprehension, commonsense inference, and error detection (Das et al., 2018).
Scientific Workflow Extraction: AST-based schema mining from code and article text produces KGs encoding logical data flows, software-method provenance, and results for scholarly meta-analysis (Haris et al., 2023).

SPARQL, Gremlin, and custom property-graph traversals enable extraction of steps, execution agents, conditional branches, resource usage, and compliance chains (Carriero et al., 26 Mar 2025, Kumar et al., 14 Apr 2025).

6. Limitations, Challenges, and Future Directions

Despite significant advances, procedural knowledge graph extraction faces the following limitations and frontiers:

Non-sequential logic extraction: LLMs and neural taggers struggle on gateways, parallel flows, and complex conditional structures (Du et al., 2024).
Event/entity segmentation bottleneck: even top models extract only ∼56% of explicit nodes correctly (chemistry synthesis), indicating a need for joint models (Mysore et al., 2017).
Small-scale, rule-based generalization issues: hand-crafted rule sets do not scale to heterogeneous procedural domains or idiosyncratic documentation styles (Du et al., 2024, Kumar et al., 14 Apr 2025).
Integration of procedural knowledge into pretraining: proposed as a direction to give LLMs innate BPMN-like logic reasoning capabilities (Du et al., 2024).
Improved exception handling, execution-vs-specification linking, and reusable workflow discovery: open ontology engineering and graph-mining challenges (Carriero et al., 26 Mar 2025, Yang et al., 2019).
Deployment and evaluation in real-world environments: usability, time-savings, error-tolerance, and informativity must be empirically measured beyond F1/ROUGE (Du et al., 2024, Ai et al., 7 Oct 2025).

A plausible implication is that procedural KG extraction will benefit from neuro-symbolic hybrids, large-scale annotated benchmarks (e.g., PAGED), iterative self-refine pipelines, and ongoing ontology-led schema curation. The field is converging on frameworks that balance expressive relational modeling, robust event/entity detection, and scalable integration with modern information-extraction and question-answering systems.