Papers
Topics
Authors
Recent
Search
2000 character limit reached

Procedural Knowledge Graph Extraction

Updated 8 January 2026
  • Procedural knowledge graph extraction converts process texts into structured graphs that encode steps, actions, and dependencies.
  • Extraction methods combine linguistic parsing, sequence labeling, entity linking, and LLM-guided graph assembly to capture control-flow and conditional logic.
  • Applications span automated QA, process mining, compliance reasoning, and dynamic querying across industrial, scientific, and creative fields.

Procedural knowledge graph extraction is the process of converting process-oriented or instructional text (recipes, maintenance manuals, scientific procedures, technical support documents) into explicit, structured graph representations that encode steps, actions, entities, control-flow, and dependencies. These procedural knowledge graphs (KGs) support process automation, search, compliance reasoning, and dynamic querying in high-stakes industrial, scientific, and creative domains. Extraction pipelines rely on a combination of linguistic parsing, sequence labeling, entity linking, graph assembly algorithms, and, increasingly, LLMs and neuro-symbolic verifiers.

1. Formal Definitions and Modeling Schemes

Procedural knowledge graph extraction formalizes document conversion as a function f:DGf : D \to G, where DD is a process-oriented text and GG is a directed labeled graph (Du et al., 2024). Graph schemas vary by application, but core abstractions universally include:

The extraction task involves identifying node textual spans, assigning node types, extracting relations, and forming the correct graph topology (including sequencing, conditional branching, and exception arcs) (Du et al., 2024, Mysore et al., 2017, Kumar et al., 14 Apr 2025). Procedural KGs may capture both specification and execution (steps performed, agents involved, issues, and duration) (Carriero et al., 26 Mar 2025).

Graph Element Node Types Edge Types
Industrial PK Procedure, Step, Action, Tool, Agent hasStep, nextStep, requiresTool
Aviation KG Component, Event, FailureMode, Action FOLLOWED_BY, HAS_CAUSE, LOCATION
BPMN Graph Actor, Action, Gateway, Constraint SequenceFlow, ConditionFlow, ConstraintFlow
Support MicroKG Procedure, Step, ConditionBlock, Effect hasStep, followedBy, hasCondition

2. Extraction Pipelines and Algorithms

Extraction methodologies span rule-based, supervised neural, unsupervised generative, and LLM-centric approaches:

Procedural instantiations often encode multi-step sequences as Stepi,FOLLOWED_BY,Stepi+1\langle Step_i, FOLLOWED\_BY, Step_{i+1} \rangle and model conditionality through ConditionBlock and Effect nodes (Kumar et al., 14 Apr 2025, Ai et al., 7 Oct 2025).

3. Ontologies, Schemas, and Design Patterns

Domain ontologies provide the semantic backbone for procedural KGs:

  • PKO (Procedural Knowledge Ontology): core classes (Procedure, Step, Action, Tool, Agent, ProcedureExecution, StepExecution, IssueOccurrence, UserQuestionOccurrence, MultiStep) (Carriero et al., 26 Mar 2025); extends PROV-O, P-Plan, DCAT/Resource.
  • BPMN-style schemas: node categories for Actor, Action, Gateway (XOR, OR, AND), Constraints, Start/End (Du et al., 2024).
  • Aviation KG: entity classes for Components, FailureModes, Events, Actions, Location, TimePeriod; relation set RR includes OWNED_BY, HAS_CAUSE, FOLLOWS, LOCATION, PART_OF (Ai et al., 7 Oct 2025).
  • Micrograph schemas: granular node and edge types for capturing section structure, steps, conditional blocks, effects, and page metadata, supporting full document context (Kumar et al., 14 Apr 2025).

Properties establish sequencing (pko:nextStep), versioning, control-flow, exception handling (IssueOccurrence, addressesIssueWith), verification, and resource linkage (Carriero et al., 26 Mar 2025). Extraction pipelines map verbs to actions/steps, nouns to entities/tools/resources, clause-boundaries to conditionals, and enumerate via list-based heuristics (Carriero et al., 26 Mar 2025, Kumar et al., 14 Apr 2025).

4. Benchmarks, Evaluation Metrics, and Experimental Results

Large-scale procedural graph extraction is evaluated on benchmarks such as PAGED (3,394 business process documents), OMIn (aviation maintenance), annotated web tutorial corpora, scientific software, and materials science synthesis procedures (Du et al., 2024, Ai et al., 7 Oct 2025, Yang et al., 2019, Haris et al., 2023, Mysore et al., 2017).

  • Metrics: BLEU-based soft F1 for text-span match (Actor, Action, Constraint) (Du et al., 2024); standard F1 for gateways and flows; ROUGE-L for procedure selection (Ai et al., 7 Oct 2025); micro-average accuracy for state prediction (Das et al., 2018).
  • Findings:
    • Sequence-tagging models achieve ∼77.6% F1 (entity extraction, synthesis domain) (Mysore et al., 2017).
    • Rule-based and pipeline baselines perform poorly on flow assembly (F1 < 0.2 for structure) (Du et al., 2024).
    • LLMs outperform baselines on text-based element detection (up to 0.78 F1 for data constraints), but all models remain below 0.6 F1 on non-sequential logic—gateway, parallel, conditional flow (Du et al., 2024).
    • Sequential heuristics dominate reference-linking for strictly sequential domains (e.g., inorganic synthesis procedures, action graphs) (Mysore et al., 2017).
    • KG-augmented RAG pipelines support global sensemaking, but text-chunk RAG slightly outperforms for fine-grained procedural QA (Ai et al., 7 Oct 2025).
Model Action F1 Flow F1 Gateway F1
Rule-based 0.308 0.056 0.485
Sequence-tagging 0.744 0.478 0.554
LLM (FT) 0.744 0.478 0.554
Self-Refine LLM +0.14 (OR) +0.02 (XOR)

5. Reasoning, Querying, and Practical Applications

Procedural KGs power a spectrum of applications:

  • Automated QA: seed-node retrieval, m-hop expansion, and graph-to-text reconstruction enable high-precision context feeding for LLM answer generation (Ai et al., 7 Oct 2025); example: query “What caused the engine to quit?” yields explicit traversal through HAS_CAUSE and TIME_PERIOD edges.
  • Support/Helpdesk Automation: micrograph schema enables granular chatbot questioning, conditional step execution, and disambiguation by constraint or OS section (Kumar et al., 14 Apr 2025).
  • Process Mining and Compliance: PKO-based graphs supply explicit modifiable process models, execution/event logs, tool reference mapping, and exception handling (Carriero et al., 26 Mar 2025).
  • Dynamic Process Tracking: stateful procedural KGs (KG-MRC) maintain evolving entity-location relations through soft co-reference and neural graph updates, supporting procedural comprehension, commonsense inference, and error detection (Das et al., 2018).
  • Scientific Workflow Extraction: AST-based schema mining from code and article text produces KGs encoding logical data flows, software-method provenance, and results for scholarly meta-analysis (Haris et al., 2023).

SPARQL, Gremlin, and custom property-graph traversals enable extraction of steps, execution agents, conditional branches, resource usage, and compliance chains (Carriero et al., 26 Mar 2025, Kumar et al., 14 Apr 2025).

6. Limitations, Challenges, and Future Directions

Despite significant advances, procedural knowledge graph extraction faces the following limitations and frontiers:

  • Non-sequential logic extraction: LLMs and neural taggers struggle on gateways, parallel flows, and complex conditional structures (Du et al., 2024).
  • Event/entity segmentation bottleneck: even top models extract only ∼56% of explicit nodes correctly (chemistry synthesis), indicating a need for joint models (Mysore et al., 2017).
  • Small-scale, rule-based generalization issues: hand-crafted rule sets do not scale to heterogeneous procedural domains or idiosyncratic documentation styles (Du et al., 2024, Kumar et al., 14 Apr 2025).
  • Integration of procedural knowledge into pretraining: proposed as a direction to give LLMs innate BPMN-like logic reasoning capabilities (Du et al., 2024).
  • Improved exception handling, execution-vs-specification linking, and reusable workflow discovery: open ontology engineering and graph-mining challenges (Carriero et al., 26 Mar 2025, Yang et al., 2019).
  • Deployment and evaluation in real-world environments: usability, time-savings, error-tolerance, and informativity must be empirically measured beyond F1/ROUGE (Du et al., 2024, Ai et al., 7 Oct 2025).

A plausible implication is that procedural KG extraction will benefit from neuro-symbolic hybrids, large-scale annotated benchmarks (e.g., PAGED), iterative self-refine pipelines, and ongoing ontology-led schema curation. The field is converging on frameworks that balance expressive relational modeling, robust event/entity detection, and scalable integration with modern information-extraction and question-answering systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Procedural Knowledge Graph Extraction.