Papers
Topics
Authors
Recent
Search
2000 character limit reached

Proposal-Centric Planner

Updated 5 February 2026
  • Proposal-Centric Planner is an architectural framework that generates, evaluates, and iteratively refines explicit candidate plans to address complex issues in control, symbolic reasoning, and LLM tool use.
  • It integrates adaptive proposal generation and evaluation techniques across domains like model-based reinforcement learning, autonomous driving, and symbolic planning to enhance scalability and data efficiency.
  • The approach improves performance through iterative refinement and optimal selection strategies, offering robust extensions for hierarchical and multi-agent planning scenarios.

A Proposal-Centric Planner is an algorithmic architecture or framework in which the generation, selection, or iterative refinement of explicit candidate plans ("proposals") is the primary organizing principle for solving complex planning, control, or reasoning problems. This paradigm appears across domains including model-based reinforcement learning, symbolic AI planning, autonomous driving, and tool-augmented LLM systems. By focusing computational resources on generating, evaluating, and improving such proposals, these systems contrast with purely reactive or monolithic planners and can address scalability, tractability, data efficiency, and diversity in high-dimensional or uncertain environments.

1. Core Principles of Proposal-Centric Planning

The proposal-centric approach centers around explicit construction and evaluation of candidate solution trajectories, action sequences, or plan graphs. At each decision point, a set of proposals is produced—these may be full-length plans, partial sequences, or structured graphs depending on the problem structure.

  • Proposal Generation: Candidate actions, trajectories, or plan stubs are proposed based on learned distributions, symbolic reasoning, policy priors, or combinatorial generators. The proposal mechanism is often adaptive, context-sensitive, and tuned for diversity or optimality.
  • Proposal Evaluation and Selection: Proposals are assessed using a suite of metrics: rollouts through learned or symbolic models, reward functions, feasibility checks, or ranking via discriminative models.
  • Iterative Refinement: Many frameworks refine proposals over multiple rounds, using feedback from environment models or ranking mechanisms (e.g., contrastive ranking, policy optimization).
  • Proposal-anchored Architecture: Proposals may be used as the substrate for downstream feature extraction (e.g., proposal-anchored attention), task decomposition, exploration, or as the units exchanged in multi-agent negotiation.

This principle is agnostic to domain, and can operate in continuous control (MPC), symbolic planning (assumption-based reasoning), autonomous driving (trajectory proposal sets), as well as tool-use for LLMs (DAG planning).

2. Representative Frameworks and Architectures

Several distinct instantiations of proposal-centric planning have been advanced:

Domain Characteristic Proposal-Centric Method arXiv ID
RL/Continuous Control Proposal-centric MPC with planner policy warm-start (Byravan et al., 2021)
Symbolic Planning Assumption-based Planning with conjecture/refutation (Pellier et al., 2018)
Autonomy/Driving iPad, SPDM: iterative trajectory proposals (Guo et al., 21 May 2025, Distelzweig et al., 17 Oct 2025)
LLM Tool Use DAG-based plan proposals for global tool execution (Wei et al., 13 Nov 2025)
Symbolic-LM Planning Action proposals with symbolic simulation, IC/CR (Xiong et al., 2 May 2025)
  • In model-based RL, a parametric fallback policy πθ\pi_\theta generates proposals that seed sampling in MPC planners (e.g., SMC, CEM), biasing search toward high-probability and task-relevant actions. The hybrid policy πB\pi_B mixes πθ\pi_\theta and planner-refined actions. Iterative planner-to-policy distillation amortizes planning knowledge for real-time execution (Byravan et al., 2021).
  • In assumption-based planning, proposal-centricity manifests as conjecture (plan+assumption) generation, iteratively refined through refutation and sub-plan offers in a multi-agent setting, seeking minimal-assumption conjectures through prioritized branch-and-bound (Pellier et al., 2018).
  • In autonomous driving, iPad (Guo et al., 21 May 2025) maintains NN iteratively refined trajectory proposals through proposal-anchored attention, with auxiliary mapping and collision-prediction tasks centered on these trajectories. The SPDM algorithm (Distelzweig et al., 17 Oct 2025) generates a diverse set of physically-plausible proposals and selects via rule-based scoring and hard collision screening.
  • In tool-augmented LLMs, proposal-centric planning is realized as end-to-end prediction of a global DAG plan of tool uses (nodes, dependencies) before single-shot execution, rather than local incremental decision-making as in ReAct (Wei et al., 13 Nov 2025).
  • In symbolic LM planning, frameworks like SymPlanner (Xiong et al., 2 May 2025) leverage LLM policies to propose discrete symbolic actions, with plan validation and iterative correction performed via a deterministic symbolic environment and contrastive ranking.

3. Detailed Algorithmic Mechanisms

Model-based Control and Amortization

In proposal-centric MPC (Byravan et al., 2021), at each decision epoch:

  • With probability 1pplan1-p_\text{plan}, propose atπθ(st)a_t \sim \pi_\theta(\cdot|s_t).
  • With probability pplanp_\text{plan}, perform a planner subroutine (SMC or CEM), with all candidate samples drawn from πθ\pi_\theta.

The planner optimizes expected return over horizon HH via model rollouts and optional value-function bootstrapping: maxa0:H1 J(a0:H1;s0)=h=0H1γhr(sh,ah)+γHVψ(sH)\max_{a_{0:H-1}}\ J(a_{0:H-1};s_0) = \sum_{h=0}^{H-1}\gamma^{h} r(s_h,a_h) + \gamma^H V_\psi(s_H) Sequential or iterative proposal update rules (e.g., SMC reweighting, CEM elite fitting) drive convergence, with sample efficiency and tractability enhanced by πθ\pi_\theta's expressivity.

Planner behavior is amortized into πθ\pi_\theta by minimizing a sum of behavioral cloning and off-policy RL losses: J(θ)=αLMPO(θ)+βLBC(θ)J(\theta) = \alpha L_\text{MPO}(\theta) + \beta L_\text{BC}(\theta) such that the distilled policy recovers nearly all planner-induced improvements, particularly in multi-goal contexts.

Symbolic and Multi-Agent Planning

Assumption-based planners (Pellier et al., 2018) define proposals as conjectures: action sequences with sets of assumptions ascribed to missing preconditions. A prioritized expansion (minimal assumption first) builds up conjecture trees. Multi-agent teams iteratively exchange, refute, and repair conjectures:

  • PROPOSE(X): share current minimal-assumption conjecture
  • REFUTE(h): challenge an assumption
  • OFFER_PLAN(X'): supply subplan to discharge an assumption
  • Nodes tracked by (Ei,Ai,wi)(E_i, A_i, w_i) encoding state, tasks, and number of assumptions.

Dialogue converges once all assumptions are either discharged or proved irreparable.

Trajectory Proposal and Evaluation in Autonomous Driving

iPad (Guo et al., 21 May 2025) frames all features, attention, and auxiliary predictions around NN explicit trajectory proposals. ProFormer iteratively refines QkQ_k (proposal queries) with proposal-anchored deformable attention extracting feature context directly relevant to each proposal's path. After KK unrolled iterations, each proposal is scored and the best is executed. Auxiliary tasks for mapping and collision prediction are conditioned on and supervise bundles of proposal trajectories.

SPDM (Distelzweig et al., 17 Oct 2025) explicitly generates a large, diverse proposal set via combinatorial variation of route, offset, and velocity parameters, scoring each with a sum of interpretable cost terms, with learned or perfect predictions used only for collision pruning, not cost shaping. This decouples comfort/progress from prediction accuracy and highlights the generative power of proposal-centric search.

Proposal-Centric Planning in Language and Symbolic Domains

LLM-based planners (Wei et al., 13 Nov 2025, Xiong et al., 2 May 2025) transition from reactive step-by-step tool-calling to up-front, holistic proposal generation. Planner-centric LLMs predict entire DAGs for tool use. SymPlanner (Xiong et al., 2 May 2025) samples candidate action proposals via policy π\pi, immediately simulates them in a symbolic environment, applies correction when invalid, and ranks the resulting plans via a learned scalar discriminator—explicitly separating proposal, verification, correction, and selection.

4. Empirical Performance and Comparative Results

Proposal-centric approaches demonstrate notable advantages in domains demanding high data efficiency, sample diversity, or real-time tractability, particularly under multi-goal, high-dimensional, or under-specified settings.

Key quantitative highlights:

Domain/Task Baseline Proposal-centric Planner Metric (at 1M steps or benchmark)
Ant GTTP Locomotion (Byravan et al., 2021) MPO only: 250 MPC+MPO+BC: 400 Mean episode return
InterPlan (driving) (Distelzweig et al., 17 Oct 2025) PDM-Closed: 52.8 SPDM: 63.7 Closed-loop score
NavSim (driving) (Guo et al., 21 May 2025) DiffusionDrive: 88.1 iPad: 91.7 PDMS metric
Tool LLM (Wei et al., 13 Nov 2025) GPT-4o (0.464) Qwen3-8B (0.659) Edge F₁ (Hard split)
PlanBench symbolic (Xiong et al., 2 May 2025) ToT: 6.7% SymPlanner: 54.2% Plan exact match (by step count)
  • In locomotion, model-based planning with policy-warmstart yields +60% higher returns on multi-goal tasks relative to tuned model-free RL.
  • In tool-augmented LLM benchmarks, global DAG proposal methods yield substantial improvements in plan quality (Edge and Node F₁, EM) and end-to-end execution rates over stepwise approaches.
  • For driving, proposal-centric planners outperform both learned integrated IPP and diffusion-based planners, particularly in rare, interactive scenarios where diverse, feasible maneuver generation matters.
  • In symbolic domains, proposal-centric LM planners (with verification and iterative correction) yield much higher plan validity and diversity compared to pure CoT or RAP baselines.

5. Strengths, Limitations, and Domain-Specific Variation

Strengths

Limitations

  • Task-Dependent Return on Investment: On narrow, single-goal tasks, proposal-centricity brings limited or no gains, as the baseline policy already achieves near-optimality (Byravan et al., 2021).
  • Dependency on Proposal Quality: If proposal generation is too narrow or myopic, overall system capability suffers—no amount of downstream optimization can compensate for missing critical behaviors (Distelzweig et al., 17 Oct 2025).
  • Computational Overhead: Some instantiations increase per-decision computational cost (e.g., repeated model rollouts, proposal anchoring, or scoring), which may be mitigated by amortization or partial execution (Guo et al., 21 May 2025).
  • Generalization Across Domains: Proposal transfer between source and target distributions can degrade performance if proposal priors are insufficiently adaptive (Byravan et al., 2021).

6. Extensions and Research Trajectories

Ongoing and proposed extensions to the proposal-centric planner paradigm include:

  • Hierarchical Proposal Generation: Learning or synthesizing proposals at multiple abstraction levels (e.g., subgoals, macro-actions, or high-level subgraphs) to tackle longer-horizon or scalable planning (Byravan et al., 2021).
  • Latent-World Proposals: Applying proposal-centric logic in learned latent state spaces for high-dimensional perceptual input (e.g., pixel-based RL, PlaNet/Dreamer architectures) (Byravan et al., 2021).
  • Improved Diversity and Adaptivity: Incorporating diversity-promoting regularizers and adaptive weighting mechanisms in proposal sets to enhance coverage and selectivity (Distelzweig et al., 17 Oct 2025).
  • Multi-agent and Multi-planner Coordination: Enabling specialized sub-planners or agent coalitions to propose, compete, and reconcile subplans in parallel, accelerating large-scale, distributed planning (Pellier et al., 2018, Wei et al., 13 Nov 2025).
  • Richer Feedback for Selection: Leveraging richer structural, performance, or failure-mode feedback from executors or simulators for more informative proposal evaluation, beyond syntactic checks or top-k likelihood (Wei et al., 13 Nov 2025).
  • Learning Proposal Scoring Functions: Using RL or IRL to learn proposal evaluators, while retaining the proposal architecture's generative and tractable nature (Distelzweig et al., 17 Oct 2025).

7. Conclusion

The proposal-centric planner paradigm organizes planning around the explicit generation, refinement, and selection of candidate solutions, unifying mechanisms across control theory, symbolic AI, autonomous driving, and LLM-based tool reasoning. It achieves demonstrable improvements in sample efficiency, robustness, tractability, and quality on complex, high-dimensional, or interactive tasks, provided that proposal generation and adaptation are properly tuned to the domain's characteristics. The flexibility of this approach allows for smooth integration with amortized policies, parallelization, and domain-specific extensions, establishing proposal-centric planning as a core architectural principle in modern decision-making systems (Byravan et al., 2021, Pellier et al., 2018, Wei et al., 13 Nov 2025, Guo et al., 21 May 2025, Distelzweig et al., 17 Oct 2025, Xiong et al., 2 May 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Proposal-Centric Planner.