SQ-BCP: Self-Querying Bidirectional Planning
- The paper presents categorical pullback verification as a central contribution that ensures plan-goal compatibility and reduces resource-violation rates to as low as 5.8%.
- SQ-BCP employs a bidirectional search strategy with systematic self-querying to explicitly resolve unknown preconditions while strictly enforcing hard constraint compliance.
- The framework integrates deterministic constraint checking with autonomous bridging actions to engineer necessary facts, thereby enhancing planning robustness in partially observable domains.
Self-Querying Bidirectional Categorical Planning (SQ-BCP) is a planning framework designed for inference-time decision-making with LLMs under conditions of partial observability and underspecified constraints. The method enforces explicit representation and systematic resolution of action preconditions, leveraging bidirectional search grounded in category theory. SQ-BCP introduces categorical pullback verification as a certificate of plan-goal compatibility, combines deterministic constraint checking with rigorous self-querying and bridging for unknowns, and achieves substantial reductions in resource-violation rates without compromising plan reference quality, as empirically demonstrated on procedural and recipe-based tasks (Qu, 27 Jan 2026).
1. Mathematical Formalism and Core Objects
The planning context in SQ-BCP is formalized via categorical structures:
- Planning States (): A state is a tuple , where encodes resource vector, is a symbolic structure (graph or propositional representation), is a set of logical predicates, and models temporal allocation.
- Morphisms (Operations): Action is a category-theoretic morphism, where transitions update resources (), manipulate structure (), predicates (), and time ().
A planning problem seeks a morphism chain from initial state to goal such that all transitions uphold hard constraints (resource, logical, and temporal) and, crucially, terminal state-object compatibility is certified via categorical pullback.
Precondition status labels for candidate actions are annotated as , denoting certainly satisfied, certainly violated, or unknown. The set of unresolved preconditions at for candidate is denoted .
A task-specific heuristic distance metric, , is employed for scoring and pruning, but correctness is exclusively determined by constraint checks and pullback verification.
Categorical pullback verification operates within a category : a candidate plan is accepted only if the forward and backward chains from and intersect at a state for which a pullback exists, guaranteeing goal compatibility and constraint satisfaction.
2. Algorithmic Structure and Workflow
SQ-BCP executes a bidirectional search, interleaving hypothesis expansion and explicit unknown resolution:
- Initialization: Separate forward () and backward () search graphs are seeded at and , respectively; planning proceeds in both directions.
- Expansion and Refinement: Nodes are expanded by generating and ranking action hypotheses, with all unknown preconditions resolved before commitment. Unknowns are addressed via a fixed sequence:
- Bridging Actions: Autonomous proposals aimed at establishing the unknown fact through side-effect actions, constrained by a limit.
- Self-Querying: If bridging fails, explicit queries are issued to an oracle or user for ground-truth resolution.
Acceptance and Verification: Candidate chains are screened for hard constraint compliance and joined if their distance falls below a threshold. Final acceptance requires existential categorical pullback:
- Cycle Detection: Refinement signatures are hashed to prevent infinite looping in bridging or query proposals.
SQ-BCP's algorithmic priorities separate solution ranking (distance-based) from correctness (constraint and categorical checks), yielding robust execution integrity.
3. Theoretical Properties and Guarantees
The framework is underpinned by formal guarantees:
- Refinement Terminates: Resolution of unknowns with bounded requires at most steps; cycle detection ensures bridging proposals do not loop indefinitely.
- Soundness (Correctness): Any plan accepted via resolved preconditions, deterministic hard-constraint checks, and successful pullback verification is categorically compatible with goal requirements.
- Completeness Under Bounded Branching: Given finite branching ( maximum hypotheses per state), bounded unknowns (), and limited bridging attempts (), all valid plans of depth are discoverable within expansions if unpruned by screening.
These theoretic results formalize the termination, soundness, and (assumption-bound) completeness of SQ-BCP as a reliable planning procedure for LLMs in partially observable domains.
4. Empirical Evaluation and Results
SQ-BCP's empirical performance is documented across procedural and recipe domains:
- Datasets:
- WikiHow: Procedure-driven tasks with "Things You'll Need" established as latent preconditions.
- RecipeNLG: Recipe adaptation, latent preconditions (e.g., binding agents).
- k-Reveal Protocol: Controlled hiding of preconditions—out of annotated, only revealed per instance.
- Oracle Simulation: All approaches employ a simulated perfect oracle for answering queries.
- Metrics:
- Reference Similarity: ROUGE-1/ROUGE-2 (WikiHow), BLEU (RecipeNLG).
- Constraint Violations: Proportion of plans violating resource or predicate requirements.
The following table (in LaTeX tabular) summarizes averaged results for reveals: $\begin{array}{l|ccc|cc} \textbf{Method} & \text{ROUGE-1↑} & \text{ROUGE-2↑} & \text{ResViol↓} & \text{BLEU↑} & \text{ResViol↓} \ \hline \text{Direct Prompt} & 46.3 & 42.1 & 78.3\% & 0.897 & 65.7\% \ \text{CoT} & 48.5 & 44.7 & 83.2\% & 0.900 & 64.1\% \ \text{ToT} & 52.9 & 45.2 & 94.7\% & 0.892 & 66.5\% \ \text{ReAct} & 55.8 & 47.4 & 76.9\% & 0.912 & 59.9\% \ \underline{\text{Self-Ask}} & \underline{56.1} & \underline{47.4} & 26.0\% & \underline{0.913} & 15.7\% \ \mathbf{SQ\text{-}BCP} & 52.7 & 45.9 & \mathbf{14.9\%} & 0.907 & \mathbf{5.8\%} \ \end{array}$ SQ-BCP reduces resource-violation rates to (WikiHow) and (RecipeNLG), nearly halving error rates relative to the best structured question-asking baseline (Self-Ask), while maintaining competitive reference similarity.
Qualitative traces demonstrate SQ-BCP's intervention: infeasible hypotheses (e.g., lacking budget) are rejected via self-queries, and previously blocked actions are unblocked through structured bridging (e.g., “sand legs into cylinders” before wheel assembly). All accepted chains maintain satisfied preconditions and categorical compatibility at meet-points (Qu, 27 Jan 2026).
5. Advantages, Limitations, and Domain Scope
SQ-BCP offers several methodological advantages:
- Explicit Precondition Labelling: Sat/Viol/Unk semantics eliminate precondition hallucination.
- Systematic Self-Querying: Unfeasible branches are detected and excluded promptly.
- Bridging Mechanism: Autonomous synthesis of supporting actions enables fact engineering at runtime.
- Pullback Verification: Compositional goal compatibility is certified via category-theoretic principles.
Limitations are noted:
- Oracle Realism: Simulation of perfect oracle responses inflates robustness; real users may be noisy, reluctant, or costly.
- LLM Misclassification: Improperly labelled or extraneous preconditions can yield unnecessary querying or premature discards.
- Constraint Coverage: Only hard constraints are modeled; continuous geometric or stochastic feasibility is out of scope.
- Computational Overhead: Bridging and verification at inference time require nontrivial compute, limiting horizon and latency scaling.
- Domain Specificity: Empirical validation is restricted to instructional and recipe domains; generalization to robotics and software remains open.
A plausible implication is that further scaling and adaptation to domains with real-world feedback, continuous constraints, or complex stochastic effects would require substantial extensions or additional modeling components.
6. Context, Research Directions, and Related Frameworks
Self-Querying Bidirectional Categorical Planning is positioned as an evolution beyond traditional prompt-based, chain-of-thought (CoT), and question-asking frameworks in LLM planning. Unlike approaches that rely on implicit or heuristic precondition inference and unstructured self-ask protocols, SQ-BCP formalizes action applicability and plan correctness via explicit precondition labelling and category-theoretic verification. This systematic approach enables predictable and sound planning under partial information, reducing resource and constraint violations without substantial loss of plan reference similarity.
Related frameworks include:
- Self-Ask, which directly queries missing information but lacks structured bridging.
- Chain-of-Thought and Tree-of-Thought, which expand thought paths but do not enforce hard constraint or categorical verification.
- ReAct, combining reasoning and acting but with higher resource violation rates.
SQ-BCP's methodology raises opportunities for integration with continuous planning, active user interaction modeling, and expansion beyond procedural generation to other AI planning domains. Further empirical validation and refinement in noisy or high-dimensional environments are identified as pressing research avenues (Qu, 27 Jan 2026).