Proposal-Centric Planner
- Proposal-Centric Planner is an architectural framework that generates, evaluates, and iteratively refines explicit candidate plans to address complex issues in control, symbolic reasoning, and LLM tool use.
- It integrates adaptive proposal generation and evaluation techniques across domains like model-based reinforcement learning, autonomous driving, and symbolic planning to enhance scalability and data efficiency.
- The approach improves performance through iterative refinement and optimal selection strategies, offering robust extensions for hierarchical and multi-agent planning scenarios.
A Proposal-Centric Planner is an algorithmic architecture or framework in which the generation, selection, or iterative refinement of explicit candidate plans ("proposals") is the primary organizing principle for solving complex planning, control, or reasoning problems. This paradigm appears across domains including model-based reinforcement learning, symbolic AI planning, autonomous driving, and tool-augmented LLM systems. By focusing computational resources on generating, evaluating, and improving such proposals, these systems contrast with purely reactive or monolithic planners and can address scalability, tractability, data efficiency, and diversity in high-dimensional or uncertain environments.
1. Core Principles of Proposal-Centric Planning
The proposal-centric approach centers around explicit construction and evaluation of candidate solution trajectories, action sequences, or plan graphs. At each decision point, a set of proposals is produced—these may be full-length plans, partial sequences, or structured graphs depending on the problem structure.
- Proposal Generation: Candidate actions, trajectories, or plan stubs are proposed based on learned distributions, symbolic reasoning, policy priors, or combinatorial generators. The proposal mechanism is often adaptive, context-sensitive, and tuned for diversity or optimality.
- Proposal Evaluation and Selection: Proposals are assessed using a suite of metrics: rollouts through learned or symbolic models, reward functions, feasibility checks, or ranking via discriminative models.
- Iterative Refinement: Many frameworks refine proposals over multiple rounds, using feedback from environment models or ranking mechanisms (e.g., contrastive ranking, policy optimization).
- Proposal-anchored Architecture: Proposals may be used as the substrate for downstream feature extraction (e.g., proposal-anchored attention), task decomposition, exploration, or as the units exchanged in multi-agent negotiation.
This principle is agnostic to domain, and can operate in continuous control (MPC), symbolic planning (assumption-based reasoning), autonomous driving (trajectory proposal sets), as well as tool-use for LLMs (DAG planning).
2. Representative Frameworks and Architectures
Several distinct instantiations of proposal-centric planning have been advanced:
| Domain | Characteristic Proposal-Centric Method | arXiv ID |
|---|---|---|
| RL/Continuous Control | Proposal-centric MPC with planner policy warm-start | (Byravan et al., 2021) |
| Symbolic Planning | Assumption-based Planning with conjecture/refutation | (Pellier et al., 2018) |
| Autonomy/Driving | iPad, SPDM: iterative trajectory proposals | (Guo et al., 21 May 2025, Distelzweig et al., 17 Oct 2025) |
| LLM Tool Use | DAG-based plan proposals for global tool execution | (Wei et al., 13 Nov 2025) |
| Symbolic-LM Planning | Action proposals with symbolic simulation, IC/CR | (Xiong et al., 2 May 2025) |
- In model-based RL, a parametric fallback policy generates proposals that seed sampling in MPC planners (e.g., SMC, CEM), biasing search toward high-probability and task-relevant actions. The hybrid policy mixes and planner-refined actions. Iterative planner-to-policy distillation amortizes planning knowledge for real-time execution (Byravan et al., 2021).
- In assumption-based planning, proposal-centricity manifests as conjecture (plan+assumption) generation, iteratively refined through refutation and sub-plan offers in a multi-agent setting, seeking minimal-assumption conjectures through prioritized branch-and-bound (Pellier et al., 2018).
- In autonomous driving, iPad (Guo et al., 21 May 2025) maintains iteratively refined trajectory proposals through proposal-anchored attention, with auxiliary mapping and collision-prediction tasks centered on these trajectories. The SPDM algorithm (Distelzweig et al., 17 Oct 2025) generates a diverse set of physically-plausible proposals and selects via rule-based scoring and hard collision screening.
- In tool-augmented LLMs, proposal-centric planning is realized as end-to-end prediction of a global DAG plan of tool uses (nodes, dependencies) before single-shot execution, rather than local incremental decision-making as in ReAct (Wei et al., 13 Nov 2025).
- In symbolic LM planning, frameworks like SymPlanner (Xiong et al., 2 May 2025) leverage LLM policies to propose discrete symbolic actions, with plan validation and iterative correction performed via a deterministic symbolic environment and contrastive ranking.
3. Detailed Algorithmic Mechanisms
Model-based Control and Amortization
In proposal-centric MPC (Byravan et al., 2021), at each decision epoch:
- With probability , propose .
- With probability , perform a planner subroutine (SMC or CEM), with all candidate samples drawn from .
The planner optimizes expected return over horizon via model rollouts and optional value-function bootstrapping: Sequential or iterative proposal update rules (e.g., SMC reweighting, CEM elite fitting) drive convergence, with sample efficiency and tractability enhanced by 's expressivity.
Planner behavior is amortized into by minimizing a sum of behavioral cloning and off-policy RL losses: such that the distilled policy recovers nearly all planner-induced improvements, particularly in multi-goal contexts.
Symbolic and Multi-Agent Planning
Assumption-based planners (Pellier et al., 2018) define proposals as conjectures: action sequences with sets of assumptions ascribed to missing preconditions. A prioritized expansion (minimal assumption first) builds up conjecture trees. Multi-agent teams iteratively exchange, refute, and repair conjectures:
- PROPOSE(X): share current minimal-assumption conjecture
- REFUTE(h): challenge an assumption
- OFFER_PLAN(X'): supply subplan to discharge an assumption
- Nodes tracked by encoding state, tasks, and number of assumptions.
Dialogue converges once all assumptions are either discharged or proved irreparable.
Trajectory Proposal and Evaluation in Autonomous Driving
iPad (Guo et al., 21 May 2025) frames all features, attention, and auxiliary predictions around explicit trajectory proposals. ProFormer iteratively refines (proposal queries) with proposal-anchored deformable attention extracting feature context directly relevant to each proposal's path. After unrolled iterations, each proposal is scored and the best is executed. Auxiliary tasks for mapping and collision prediction are conditioned on and supervise bundles of proposal trajectories.
SPDM (Distelzweig et al., 17 Oct 2025) explicitly generates a large, diverse proposal set via combinatorial variation of route, offset, and velocity parameters, scoring each with a sum of interpretable cost terms, with learned or perfect predictions used only for collision pruning, not cost shaping. This decouples comfort/progress from prediction accuracy and highlights the generative power of proposal-centric search.
Proposal-Centric Planning in Language and Symbolic Domains
LLM-based planners (Wei et al., 13 Nov 2025, Xiong et al., 2 May 2025) transition from reactive step-by-step tool-calling to up-front, holistic proposal generation. Planner-centric LLMs predict entire DAGs for tool use. SymPlanner (Xiong et al., 2 May 2025) samples candidate action proposals via policy , immediately simulates them in a symbolic environment, applies correction when invalid, and ranks the resulting plans via a learned scalar discriminator—explicitly separating proposal, verification, correction, and selection.
4. Empirical Performance and Comparative Results
Proposal-centric approaches demonstrate notable advantages in domains demanding high data efficiency, sample diversity, or real-time tractability, particularly under multi-goal, high-dimensional, or under-specified settings.
Key quantitative highlights:
| Domain/Task | Baseline | Proposal-centric Planner | Metric (at 1M steps or benchmark) |
|---|---|---|---|
| Ant GTTP Locomotion (Byravan et al., 2021) | MPO only: 250 | MPC+MPO+BC: 400 | Mean episode return |
| InterPlan (driving) (Distelzweig et al., 17 Oct 2025) | PDM-Closed: 52.8 | SPDM: 63.7 | Closed-loop score |
| NavSim (driving) (Guo et al., 21 May 2025) | DiffusionDrive: 88.1 | iPad: 91.7 | PDMS metric |
| Tool LLM (Wei et al., 13 Nov 2025) | GPT-4o (0.464) | Qwen3-8B (0.659) | Edge F₁ (Hard split) |
| PlanBench symbolic (Xiong et al., 2 May 2025) | ToT: 6.7% | SymPlanner: 54.2% | Plan exact match (by step count) |
- In locomotion, model-based planning with policy-warmstart yields +60% higher returns on multi-goal tasks relative to tuned model-free RL.
- In tool-augmented LLM benchmarks, global DAG proposal methods yield substantial improvements in plan quality (Edge and Node F₁, EM) and end-to-end execution rates over stepwise approaches.
- For driving, proposal-centric planners outperform both learned integrated IPP and diffusion-based planners, particularly in rare, interactive scenarios where diverse, feasible maneuver generation matters.
- In symbolic domains, proposal-centric LM planners (with verification and iterative correction) yield much higher plan validity and diversity compared to pure CoT or RAP baselines.
5. Strengths, Limitations, and Domain-Specific Variation
Strengths
- Sample Efficiency and Scalability: Proposal-centric sampling directs computation to promising regions, making high-dimensional, multi-modal planning tractable in complex tasks (Byravan et al., 2021, Guo et al., 21 May 2025).
- Diversity and Robustness: Generation of multiple, explicit proposals prevents mode-collapse and enables maneuver coverage critical in uncertain or interactive environments (Distelzweig et al., 17 Oct 2025, Xiong et al., 2 May 2025).
- Extensibility: Proposal-anchored architectures extend to multi-agent negotiation, hierarchical decomposition, multi-tool composition, and symbolic/continuous control integration (Pellier et al., 2018, Wei et al., 13 Nov 2025).
Limitations
- Task-Dependent Return on Investment: On narrow, single-goal tasks, proposal-centricity brings limited or no gains, as the baseline policy already achieves near-optimality (Byravan et al., 2021).
- Dependency on Proposal Quality: If proposal generation is too narrow or myopic, overall system capability suffers—no amount of downstream optimization can compensate for missing critical behaviors (Distelzweig et al., 17 Oct 2025).
- Computational Overhead: Some instantiations increase per-decision computational cost (e.g., repeated model rollouts, proposal anchoring, or scoring), which may be mitigated by amortization or partial execution (Guo et al., 21 May 2025).
- Generalization Across Domains: Proposal transfer between source and target distributions can degrade performance if proposal priors are insufficiently adaptive (Byravan et al., 2021).
6. Extensions and Research Trajectories
Ongoing and proposed extensions to the proposal-centric planner paradigm include:
- Hierarchical Proposal Generation: Learning or synthesizing proposals at multiple abstraction levels (e.g., subgoals, macro-actions, or high-level subgraphs) to tackle longer-horizon or scalable planning (Byravan et al., 2021).
- Latent-World Proposals: Applying proposal-centric logic in learned latent state spaces for high-dimensional perceptual input (e.g., pixel-based RL, PlaNet/Dreamer architectures) (Byravan et al., 2021).
- Improved Diversity and Adaptivity: Incorporating diversity-promoting regularizers and adaptive weighting mechanisms in proposal sets to enhance coverage and selectivity (Distelzweig et al., 17 Oct 2025).
- Multi-agent and Multi-planner Coordination: Enabling specialized sub-planners or agent coalitions to propose, compete, and reconcile subplans in parallel, accelerating large-scale, distributed planning (Pellier et al., 2018, Wei et al., 13 Nov 2025).
- Richer Feedback for Selection: Leveraging richer structural, performance, or failure-mode feedback from executors or simulators for more informative proposal evaluation, beyond syntactic checks or top-k likelihood (Wei et al., 13 Nov 2025).
- Learning Proposal Scoring Functions: Using RL or IRL to learn proposal evaluators, while retaining the proposal architecture's generative and tractable nature (Distelzweig et al., 17 Oct 2025).
7. Conclusion
The proposal-centric planner paradigm organizes planning around the explicit generation, refinement, and selection of candidate solutions, unifying mechanisms across control theory, symbolic AI, autonomous driving, and LLM-based tool reasoning. It achieves demonstrable improvements in sample efficiency, robustness, tractability, and quality on complex, high-dimensional, or interactive tasks, provided that proposal generation and adaptation are properly tuned to the domain's characteristics. The flexibility of this approach allows for smooth integration with amortized policies, parallelization, and domain-specific extensions, establishing proposal-centric planning as a core architectural principle in modern decision-making systems (Byravan et al., 2021, Pellier et al., 2018, Wei et al., 13 Nov 2025, Guo et al., 21 May 2025, Distelzweig et al., 17 Oct 2025, Xiong et al., 2 May 2025).