SOPStruct: Structured SOP Transformation
- SOPStruct is an LLM-driven framework that converts unstructured Standard Operating Procedures into a standardized, DAG-based decision tree representation.
- It segments SOPs into atomic fragments and uses schema-constrained subtask extraction alongside dual (PDDL and LLM) evaluations to ensure soundness and completeness.
- By enabling backtracking, error correction, and automation, SOPStruct enhances workflow management and procedural knowledge digitization.
SOPStruct is a LLM–driven framework for transforming unstructured Standard Operating Procedures (SOPs) into standardized, decision-tree–structured representations. SOPStruct leverages a directed acyclic graph (DAG) formalism and schema-constrained subtask extraction to capture sequential logic, dependencies, and conditional flows in procedural documents. The framework achieves high soundness and completeness, validated through a dual (PDDL-based and LLM-based) evaluation pipeline, and supports automation, backtracking, and error correction by design (Garg et al., 28 Mar 2025).
1. Formal Representation and Graph Structure
SOPStruct encodes a raw SOP as a directed acyclic graph , where each node represents a distinct subtask with a structured attribute payload:
- : Short identifier.
- : Free-text description.
- : Immediate dependency set, i.e., parent nodes .
- : Inputs required from the SOP’s initial conditions.
- : Mapping from ancestor outputs to the inputs consumed.
- : Outputs produced.
- : Category label, chosen from {\textsf{HumanInput}, \textsf{InfoProc}, \textsf{InfoExtr}, \textsf{Knowledge}, \textsf{Decision}}.
Branching for conditionals or decisions is expressed by multivalent outgoing edges from “Decision” nodes, naturally embedding decision-tree logic while maintaining acyclicity.
2. Algorithmic Pipeline
The end-to-end SOPStruct process is segmented into three principal stages: segmentation, structure generation, and evaluation.
Pipeline Overview:
1 2 3 4 5 6 7 8 9 10 |
Input: raw SOP text P
Output: structured DAG G
1. S ← SegmentSOP(P)
2. H_fragments ← ∅
3. for each s ∈ S do
4. G_s ← GenerateStructure(s)
5. H_fragments ← H_fragments ∪ {G_s}
6. G ← MergeFragments(H_fragments)
7. Evaluate(G)
8. return G |
- Segmentation: The raw SOP is divided into sub-documents via an LLM issued with few-shot span-labeling prompts. Each is crafted to fit the LLM context and be internally semantically atomic.
- Structure Generation: Every segment is presented to the LLM, with instructions and schema to emit a JSON-conformant list of subtask objects (see schema below). Each subtask is internally parsed to synthesize vertices and edges.
- Fragment Merging: Variables and dependencies across segments are unified through string matching and LLM-verified name alignment, with the global DAG constructed from the union of all nodes and edges.
- Evaluation: The representation undergoes soundness and completeness verification before finalization.
3. Prompt Engineering and Architectural Considerations
SOPStruct is implemented using raw GPT-4 without model fine-tuning, relying entirely on carefully designed prompts and postprocessing pipelines. Three distinct prompting templates are used:
- Segmentation: Few-shot examples for logical process step extraction.
- Subtask Generation: JSON schema description, with exemplification, mandates all output conform to the prescribed schema.
- Wrapper/Validation: Directs the LLM to emit valid, schema-checkable JSON.
During execution, post-processing includes JSON syntax validation, cross-referencing of dependency names, and syntactic normalization of identifiers. This prompt-driven approach constrains output variance and mitigates LLM hallucination.
JSON schema for subtasks:
1 2 3 4 5 6 7 8 9 |
{
"name": "...",
"description": "...",
"dependencies": ["name_of_parent", ...],
"inputs": ["var1","var2",...],
"inputs_from_dependencies": {"parent_name":["varX",...], ...},
"outputs": ["var3",...],
"category": "Action|Decision|Knowledge|..."
} |
4. Verification Methodology: Soundness and Completeness
The validity of the generated DAG is assessed using a dual-evaluation protocol:
A. Deterministic PDDL-based Verification:
The DAG is encoded into a PDDL domain/problem where:
- Predicates:
- , ,
- Actions:
- enacts a node conditional on required inputs, updating output availabilities.
- Initial/Goal States:
- The initial state reflects the union of all primary inputs.
- The goal is satisfaction of all outputs at the leaf nodes.
A PDDL planner is used to check if the DAG is connected and all dependencies are resolvable. The primary metric is the StructuredPlanScore:
B. Non-Deterministic LLM-Based Assessment:
Aspects not encoded in PDDL—such as alignment between initial/goal states and the SOP text, or comprehensiveness of leaf node outputs—are evaluated by prompting GPT-4, with metrics thresholded at 0.8 similarity/confidence for pass status.
Completeness checks ask whether any critical SOP step or dependency is missing in .
5. Empirical Results and Benchmarking
SOPStruct was empirically evaluated on three datasets with varying complexity:
- Nestful API (nested, short, low complexity)
- RecipeNLG (culinary, medium complexity)
- Business Process (multi-step enterprise procedures, high complexity)
Metrics included StructuredPlanScore, InitialStateValidation, GoalStateValidation, PlanCompleteness, DependencyScore, and InputsFromDependencyScore.
| Dataset | SOPStruct StructuredPlanScore | Code-Style Baseline | BPMN Baseline | SOPStruct Completeness | Code-Style Completeness | BPMN Completeness |
|---|---|---|---|---|---|---|
| Business Process | 100% | 66.17% | 62.19% | 94% | 55.65% | 52.31% |
SOPStruct consistently achieved 100% on deterministic graph metrics and ≥93% on LLM-based metrics, significantly outperforming alternative baselines. Segmentation preserved nuanced steps in long SOPs, schema-constrained prompting reduced hallucinations, and PDDL verification provided strong formal correctness guarantees (Garg et al., 28 Mar 2025).
6. Capabilities: Backtracking, Error Correction, and Automation
The DAG structure enables robust operational features:
- Backtracking: On node failure (e.g., invalid human input), only ancestor nodes in need be re-executed—no global rollback required.
- Error Correction: Runtime checks of map inconsistencies to specific upstream nodes for targeted re-evaluation.
- Workflow Automation: The PDDL encoding supports translation to automated planners and robotic controllers. Categorization of subtasks drives UI prompts (\textsf{HumanInput}), backend API dispatches (\textsf{InfoProc}/\textsf{InfoExtr}), and rule engine activations (\textsf{Decision}).
7. Limitations and Prospective Developments
Identified limitations include:
- Complex nested conditions can lead to increased segment count and merging complexity.
- LLM non-determinism occasionally withholds rare, domain-specific steps.
- PDDL reliance restricts expressivity (no support for rich temporal/probabilistic dependencies).
Proposed future directions:
- Integration of iterative self-reflection (e.g., “Reflexion” loops) for LLM-based DAG refinement.
- Hybrid fine-tuning on domain-specific SOP corpora for improved step retention.
- Adoption of PDDL2.1 for temporal/resource constraints and cost-based optimization.
- UI-based expert editing to permit user-in-the-loop adjustments before deployment.
- Empirical assessment in operational settings on metrics such as error rates, completion time, and user satisfaction.
SOPStruct exemplifies an end-to-end methodology spanning segmentation, schema-driven structuralization, dual formal/heuristic verification, and downstream optimization/automation, establishing a new paradigm for SOP digitization and procedural knowledge management (Garg et al., 28 Mar 2025).