SOPStruct: Structured SOP Transformation

Updated 27 March 2026

SOPStruct is an LLM-driven framework that converts unstructured Standard Operating Procedures into a standardized, DAG-based decision tree representation.
It segments SOPs into atomic fragments and uses schema-constrained subtask extraction alongside dual (PDDL and LLM) evaluations to ensure soundness and completeness.
By enabling backtracking, error correction, and automation, SOPStruct enhances workflow management and procedural knowledge digitization.

SOPStruct is a LLM–driven framework for transforming unstructured Standard Operating Procedures (SOPs) into standardized, decision-tree–structured representations. SOPStruct leverages a directed acyclic graph (DAG) formalism and schema-constrained subtask extraction to capture sequential logic, dependencies, and conditional flows in procedural documents. The framework achieves high soundness and completeness, validated through a dual (PDDL-based and LLM-based) evaluation pipeline, and supports automation, backtracking, and error correction by design (Garg et al., 28 Mar 2025).

1. Formal Representation and Graph Structure

SOPStruct encodes a raw SOP as a directed acyclic graph $G = (V, E)$ , where each node $v_i \in V$ represents a distinct subtask with a structured attribute payload:

$v_i = \bigl(\mathsf{name}_i,\;\mathsf{desc}_i,\;D_i,\;I_i,\;I^{\mathrm{dep}}_i,\;O_i,\;\mathsf{cat}_i\bigr)$

$\mathsf{name}_i$ : Short identifier.
$\mathsf{desc}_i$ : Free-text description.
$D_i$ : Immediate dependency set, i.e., parent nodes $\{v_j \mid (v_j, v_i) \in E\}$ .
$I_i$ : Inputs required from the SOP’s initial conditions.
$I^{\mathrm{dep}}_i$ : Mapping from ancestor outputs to the inputs consumed.
$O_i$ : Outputs produced.
$\mathsf{cat}_i$ : Category label, chosen from {\textsf{HumanInput}, \textsf{InfoProc}, \textsf{InfoExtr}, \textsf{Knowledge}, \textsf{Decision}}.

Branching for conditionals or decisions is expressed by multivalent outgoing edges from “Decision” nodes, naturally embedding decision-tree logic while maintaining acyclicity.

2. Algorithmic Pipeline

The end-to-end SOPStruct process is segmented into three principal stages: segmentation, structure generation, and evaluation.

Pipeline Overview:

Input: raw SOP text P
Output: structured DAG G
1. S ← SegmentSOP(P)
2. H_fragments ← ∅
3. for each s ∈ S do
4.     G_s ← GenerateStructure(s)
5.     H_fragments ← H_fragments ∪ {G_s}
6. G ← MergeFragments(H_fragments)
7. Evaluate(G)
8. return G

Segmentation: The raw SOP is divided into sub-documents $S = \{S_1, ..., S_m\}$ via an LLM issued with few-shot span-labeling prompts. Each $S_k$ is crafted to fit the LLM context and be internally semantically atomic.
Structure Generation: Every segment $S_k$ is presented to the LLM, with instructions and schema to emit a JSON-conformant list of subtask objects (see schema below). Each subtask is internally parsed to synthesize vertices and edges.
Fragment Merging: Variables and dependencies across segments are unified through string matching and LLM-verified name alignment, with the global DAG $G$ constructed from the union of all nodes and edges.
Evaluation: The representation undergoes soundness and completeness verification before finalization.

3. Prompt Engineering and Architectural Considerations

SOPStruct is implemented using raw GPT-4 without model fine-tuning, relying entirely on carefully designed prompts and postprocessing pipelines. Three distinct prompting templates are used:

Segmentation: Few-shot examples for logical process step extraction.
Subtask Generation: JSON schema description, with exemplification, mandates all output conform to the prescribed schema.
Wrapper/Validation: Directs the LLM to emit valid, schema-checkable JSON.

During execution, post-processing includes JSON syntax validation, cross-referencing of dependency names, and syntactic normalization of identifiers. This prompt-driven approach constrains output variance and mitigates LLM hallucination.

JSON schema for subtasks:

{
  "name": "...",
  "description": "...",
  "dependencies": ["name_of_parent", ...],
  "inputs": ["var1","var2",...],
  "inputs_from_dependencies": {"parent_name":["varX",...], ...},
  "outputs": ["var3",...],
  "category": "Action|Decision|Knowledge|..."
}

4. Verification Methodology: Soundness and Completeness

The validity of the generated DAG $G$ is assessed using a dual-evaluation protocol:

A. Deterministic PDDL-based Verification:

The DAG is encoded into a PDDL domain/problem where:

Predicates:
- $(\text{available}~?v)$ , $(\text{required-input}~?v~?s)$ , $(\text{subtask-output}~?v~?s)$
Actions:
- $:\text{action execute-subtask}$ enacts a node conditional on required inputs, updating output availabilities.
Initial/Goal States:
- The initial state reflects the union of all primary inputs.
- The goal is satisfaction of all outputs at the leaf nodes.

A PDDL planner is used to check if the DAG is connected and all dependencies are resolvable. The primary metric is the StructuredPlanScore: $\text{StructuredPlanScore} = \begin{cases} 1, & \text{if planner succeeds} \ 0, & \text{otherwise} \end{cases}$

B. Non-Deterministic LLM-Based Assessment:

Aspects not encoded in PDDL—such as alignment between initial/goal states and the SOP text, or comprehensiveness of leaf node outputs—are evaluated by prompting GPT-4, with metrics thresholded at 0.8 similarity/confidence for pass status.

Completeness checks ask whether any critical SOP step or dependency is missing in $G$ .

5. Empirical Results and Benchmarking

SOPStruct was empirically evaluated on three datasets with varying complexity:

Nestful API (nested, short, low complexity)
RecipeNLG (culinary, medium complexity)
Business Process (multi-step enterprise procedures, high complexity)

Metrics included StructuredPlanScore, InitialStateValidation, GoalStateValidation, PlanCompleteness, DependencyScore, and InputsFromDependencyScore.

Dataset	SOPStruct StructuredPlanScore	Code-Style Baseline	BPMN Baseline	SOPStruct Completeness	Code-Style Completeness	BPMN Completeness
Business Process	100%	66.17%	62.19%	94%	55.65%	52.31%

SOPStruct consistently achieved 100% on deterministic graph metrics and ≥93% on LLM-based metrics, significantly outperforming alternative baselines. Segmentation preserved nuanced steps in long SOPs, schema-constrained prompting reduced hallucinations, and PDDL verification provided strong formal correctness guarantees (Garg et al., 28 Mar 2025).

6. Capabilities: Backtracking, Error Correction, and Automation

The DAG structure enables robust operational features:

Backtracking: On node failure (e.g., invalid human input), only ancestor nodes in $D_j$ need be re-executed—no global rollback required.
Error Correction: Runtime checks of $I^{\mathrm{dep}}_j$ map inconsistencies to specific upstream nodes for targeted re-evaluation.
Workflow Automation: The PDDL encoding supports translation to automated planners and robotic controllers. Categorization of subtasks drives UI prompts (\textsf{HumanInput}), backend API dispatches (\textsf{InfoProc}/\textsf{InfoExtr}), and rule engine activations (\textsf{Decision}).

7. Limitations and Prospective Developments

Identified limitations include:

Complex nested conditions can lead to increased segment count and merging complexity.
LLM non-determinism occasionally withholds rare, domain-specific steps.
PDDL reliance restricts expressivity (no support for rich temporal/probabilistic dependencies).

Proposed future directions:

Integration of iterative self-reflection (e.g., “Reflexion” loops) for LLM-based DAG refinement.
Hybrid fine-tuning on domain-specific SOP corpora for improved step retention.
Adoption of PDDL2.1 for temporal/resource constraints and cost-based optimization.
UI-based expert editing to permit user-in-the-loop adjustments before deployment.
Empirical assessment in operational settings on metrics such as error rates, completion time, and user satisfaction.

SOPStruct exemplifies an end-to-end methodology spanning segmentation, schema-driven structuralization, dual formal/heuristic verification, and downstream optimization/automation, establishing a new paradigm for SOP digitization and procedural knowledge management (Garg et al., 28 Mar 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Generating Structured Plan Representation of Procedures with LLMs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SOPStruct Model.

SOPStruct: Structured SOP Transformation

1. Formal Representation and Graph Structure

2. Algorithmic Pipeline

3. Prompt Engineering and Architectural Considerations

4. Verification Methodology: Soundness and Completeness

5. Empirical Results and Benchmarking

6. Capabilities: Backtracking, Error Correction, and Automation

7. Limitations and Prospective Developments

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SOPStruct: Structured SOP Transformation

1. Formal Representation and Graph Structure

2. Algorithmic Pipeline

3. Prompt Engineering and Architectural Considerations

4. Verification Methodology: Soundness and Completeness

5. Empirical Results and Benchmarking

6. Capabilities: Backtracking, Error Correction, and Automation

7. Limitations and Prospective Developments

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research