Programming over Thinking: Efficient and Robust Multi-Constraint Planning

Published 14 Jan 2026 in cs.AI | (2601.09097v1)

Abstract: Multi-constraint planning involves identifying, evaluating, and refining candidate plans while satisfying multiple, potentially conflicting constraints. Existing LLM approaches face fundamental limitations in this domain. Pure reasoning paradigms, which rely on long natural language chains, are prone to inconsistency, error accumulation, and prohibitive cost as constraints compound. Conversely, LLMs combined with coding- or solver-based strategies lack flexibility: they often generate problem-specific code from scratch or depend on fixed solvers, failing to capture generalizable logic across diverse problems. To address these challenges, we introduce the Scalable COde Planning Engine (SCOPE), a framework that disentangles query-specific reasoning from generic code execution. By separating reasoning from execution, SCOPE produces solver functions that are consistent, deterministic, and reusable across queries while requiring only minimal changes to input parameters. SCOPE achieves state-of-the-art performance while lowering cost and latency. For example, with GPT-4o, it reaches 93.1% success on TravelPlanner, a 61.6% gain over the best baseline (CoT) while cutting inference cost by 1.4x and time by ~4.67x. Code is available at https://github.com/DerrickGXD/SCOPE.

Abstract PDF Upgrade to Chat

Summary

The paper presents SCOPE, a two-stage planning framework that disentangles query-specific reasoning from generic solver generation.
It achieves high accuracy and robustness, with success rates reaching 100% on meeting planning and significant gains over chain-of-thought methods.
The framework reduces inference cost and latency while ensuring reusability and scalability of solver code across similar multi-constraint tasks.

Programming over Thinking: Efficient and Robust Multi-Constraint Planning

Motivation and Problem Setting

Multi-constraint sequential planning requires the decomposition of queries into candidate solutions satisfying multiple and sometimes conflicting constraints, typical in real-world tasks such as travel itinerary generation and meeting scheduling. Traditional LLM-driven approaches, predominantly those based on text-based chain-of-thought or multi-agent reasoning, exhibit scaling bottlenecks and robustness failures. Specifically, long natural language reasoning chains tend to accumulate errors and lose consistency with complex or lengthy constraint structures, while code- or solver-based strategies are typically query-specific, imposing inflexible and non-generalizable execution logic. The probabilistic nature of LLM outputs exacerbates these issues, hindering consistent constraint tracking and leading to high inference costs as solution space expands.

Framework: Scalable COde Planning Engine (SCOPE)

SCOPE introduces a two-stage disentangled planning and execution paradigm, operationalizing multi-agent LLM workflows. The query-specific reasoning stage formalizes the problem: LLM agents extract a structured representation of combinations (candidate generation parameters) and constraints (validation logic) from a single example query–solution pair. These structured representations, once optimized via multiple parameter-free refinement agents, define the generic solver abstraction for the problem domain.

The second stage — generic solver generation — programmatically synthesizes reusable, deterministic solver functions:

Combination Function: Exhaustively enumerates candidate plans using the formalized combination parameters, supporting permutation and assignment invariants dictated by the domain.
Filter Function: Deterministically selects valid plans from candidates based solely on constraint satisfaction, independent of query-specific logic.
Deliver Function: Formats structured solution outputs as domain-aligned natural language descriptions.

Critically, the solver code is unchanged across queries of the same domain; only input parameters (structured combinations and constraints output by LLM inference) are adapted. Solver code refinement is performed autonomously by comparing generated and ground-truth outputs, ensuring code meets domain requirements without manual prompt engineering or heuristics.

Experimental Evaluation

Benchmarks and Model Families

SCOPE was evaluated on TravelPlanner [Xie2024TravelPlanner] and Natural Plan [Zheng2024NaturalPlan], representing canonical multi-constraint planning environments with combinatorial complexity and closed constraint systems. Experiments spanned five proprietary LLMs (GPT-4o, GPT-o3, GPT-5, Gemini-1.5-Pro, Gemini-2.5-Pro), and compared against reasoning baselines: direct prompting, Chain-of-Thought [Wei2022ChainOfThought], Tree-of-Thought [Yao2023ToT], EvoAgent [Yuan2025Evoagent], HyperTree Planning [gui2025HTP], and code-based Thought of Search [Liu2024Tos].

Numerical Results

SCOPE achieves strong empirical performance:

TravelPlanner (GPT-4o): SCOPE succeeds on 93.1% of queries, representing a 61.6% gain over CoT (success 31.5%).
Trip Planning (GPT-4o): SCOPE, at 87.1%, far exceeds ToS (12.5%) and CoT (3.9%).
Meeting Planning (GPT-4o): SCOPE achieves 100% success, while ToS registers 59.8% and CoT 47.4%.
Efficiency: SCOPE reduces inference cost by up to $1.4\times$ and latency by $4.67\times$ compared to leading baselines, especially as planning horizon or constraint count increases.
Performance consistency: SCOPE offers minimal drop in accuracy as combinatorial or constraint complexity increases, in contrast to baselines that degrade rapidly.

On stronger models (GPT-5, Gemini-2.5-Pro), SCOPE matches or exceeds baseline performance while achieving significantly better cost and latency scaling; on weaker models, SCOPE demonstrably closes the gap to state-of-the-art. The analysis details robustness gains in error-prone planning horizons and under long-horizon constraint aggregation.

Ablation and Error Analysis

Systematic ablation of SCOPE components (problem formalization, optimization, refinement) results in severe performance drops, underscoring the necessity of each agentic stage for robust abstraction and generalization. Error analysis indicates that the principal failure mode is the Input Agent’s misinterpretation in query-to-parameter mapping, especially for smaller models, or overgeneralization from demonstrations, typically not solver-related.

Theoretical and Practical Implications

SCOPE demonstrates that disentangling natural language reasoning from execution logic substantially mitigates the fundamental limitations of probabilistic LLM output in planning. The explicit abstraction of combinatorial generation and constraint satisfaction not only enables statically sound, reusable solver logic but also induces strong generalization to unseen queries within a domain. This approach is architecturally orthogonal to existing slow-thinking and multi-agent reasoning paradigms, circumventing the error propagation and scaling bottlenecks intrinsic to text-driven models.

Practically, SCOPE enables efficient deployment of LLM-based agents in real-world settings requiring robust, cost-effective constraint satisfaction and planning — for example, itinerary generation, high-frequency scheduling, and resource allocation. The independence of solver code from query content supports modular domain adaptation and swift inference.

Theoretically, SCOPE offers a bridge between symbolic planning, combinatorial search, and LLM-based natural language understanding. It provides a pathway for integrating declarative representations and procedural code within LLM workflows, supporting future research into cross-domain code abstraction, interpretable AI planning, and hybrid symbolic–neural reasoning.

Future Directions

Open challenges remain. SCOPE’s solvers generalize only within a domain; domain transfer requires re-formalization and code regeneration. Further, the reliance on the coding competence of proprietary LLMs may limit transferability to open-source or specialized models. Promising future directions include meta-abstraction of solver code across domains, automated benchmarking of solution space and constraint specification, and downstream applications in real-time agentic coordination and multimodal planning.

Conclusion

The SCOPE framework establishes an efficient, robust paradigm for multi-constraint planning with LLMs by separating query-specific formalization from generic solver code execution. The empirical and theoretical analyses show clear superiority in accuracy, scalability, and efficiency, enabling practical deployment of LLM agents for complex planning tasks and inspiring future developments in programmatic AI reasoning (2601.09097).

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

Authors (5)

Collections

GitHub

GitHub - DerrickGXD/SCOPE

Programming over Thinking: Efficient and Robust Multi-Constraint Planning

Summary

Programming over Thinking: Efficient and Robust Multi-Constraint Planning

Motivation and Problem Setting

Framework: Scalable COde Planning Engine (SCOPE)

Experimental Evaluation

Benchmarks and Model Families

Numerical Results

Ablation and Error Analysis

Theoretical and Practical Implications

Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections

GitHub

Tweets