Planning-Execution Module (PEM)

Updated 15 January 2026

Planning-Execution Module (PEM) is a computational framework that combines symbolic, continuous, and hybrid planning with real-time execution to manage dynamic tasks.
It integrates methods such as heuristic-guided search, motion planning, task decomposition, and LLM-based strategies to overcome real-world uncertainties.
PEMs emphasize continuous feedback, monitoring, and adaptive replanning to improve robustness and efficiency in domains like robotics and autonomous systems.

A Planning-Execution Module (PEM) is a computational architecture that interleaves the construction of symbolic or hybrid plans with their realization in a physical or simulated environment. PEMs are central to modern AI agents, robotics, autonomous systems, and interactive LLM-based agents, enabling the continuous, adaptive coupling between planning algorithms and the real-world constraints encountered during execution. The PEM paradigm subsumes a range of algorithmic styles, including symbolic task planning, continuous motion planning, heuristic-guided search, experience-based adaptation, and concurrent metareasoning. The PEM label is not tied to one canonical architecture, but encompasses a spectrum of tightly integrated systems that jointly manage action selection, plan refinement, monitoring, and recovery in real or uncertain domains.

1. Formal Definitions and Foundational Models

The core of a PEM is a closed loop that interleaves planning and execution phases. In classical AI planning, this starts from a well-defined problem such as a fully observable Markov Decision Process (MDP) $M = (S, A, T, R, \gamma)$ , where $S$ is the state space, $A$ is the set of actions, $T$ is the transition model, $R$ is the reward or cost function, and $\gamma$ is the discount factor (Dearden et al., 2013). In robotic task and motion planning, the domain consists of symbolic state spaces, continuous configuration manifolds, and underactuated or sensor-limited constraints (Pan et al., 2024). The PEM in this setting must reason over:

Discrete symbolic action models $(P, A, \mathcal{C})$ where $P$ are propositional fluents, $A$ are actions, and $\mathcal{C}$ is a set of constraints.
Continuous spaces $X \subset \mathbb{R}^d$ for robot configurations and feasible paths $\xi: [0,1]\to X_{\rm free}$ .
External modules for execution, including behaviors $b\in B$ that correspond to closed-loop policies bridging grounding gaps.

PEMs in stochastic or uncertain environments extend these models to account for partial observability, stochasticity in dynamics, and the acquisition of new constraints through execution failures or sensory feedback (Vemula et al., 2020, Pan et al., 2024).

2. Principal PEM Architectures and Algorithms

PEM design is instantiated in several algorithmic frameworks. A non-exhaustive typology includes:

Depth-limited Expectimax Search: The planner performs a limited expansion in the decision tree, uses a heuristic $h(s)$ at the leaves, selects the best action, then executes and observes the environment (Dearden et al., 2013). A cache stores values for visited states, and pruning (utility or expectation cuts) reduces expansion.
Experience-Adaptive Planning (e.g., CMAX/CMAX++): PEMs detect mismatches between their model predictions $\hat f(s, a)$ and real transitions $f(s, a)$ , maintain a set of penalized transitions $X$ , and online adapt their planning strategies by increasing costs for observed errors, or learning $Q$ -value estimates for mis-modeled regions (Vemula et al., 2020, Vemula et al., 2020).
Task and Motion Hybrid Planning: PEMs in robotic systems combine task-level planning with geometric/kinodynamic motion planning; actions that cannot be fully grounded due to lack of information are deferred to a closed-loop behavior module, and failures feed back as symbolic constraints to force replanning (Pan et al., 2024).
LLM-based Global Planning with Hierarchical Execution: In advanced LLM agents, the PEM consists of a global, continuously updated plan $G_t$ and a hierarchical executor which decomposes plan steps into “skills” (e.g., searching, coding, writing) and collects observations $O_t$ to drive dynamic plan refinement (Chen et al., 23 Apr 2025).
Concurrent and Metareasoning-based PEMs: In time-pressured settings, PEMs formally reason over the allocation of resources to concurrent planning processes and committed actions, often framed as a metareasoning or MDP problem where the policy selects between “think” (allocate planning time) and “act” (commit to next plan prefix), balancing the risk of irreversible action against the opportunity for plan improvement (Elboher et al., 2023).

3. Execution Monitoring, Feedback, and Adaptation

Key to PEM operation is tight execution monitoring and dynamic feedback. During execution, PEMs:

Monitor realized world states $\hat s$ for divergence from predicted models.
Detect and penalize mis-modeled transitions or ungroundable actions via online learning or constraint augmentation (Vemula et al., 2020, Pan et al., 2024).
Defer unresolvable planning gaps to reactive, closed-loop controllers (behaviors) and monitor their outcomes for success or failure (Pan et al., 2024).
In LLM-agent PEMs, utilize observations $O_t$ from hierarchical skills as feedback to update the global plan (Chen et al., 23 Apr 2025).
Trigger replanning upon failure, typically adding new symbolic constraints $c^\star$ to avoid previously unsuccessful branches (Pan et al., 2024).
In metareasoning PEMs, maintain and update stochastic process models (e.g., node expansions, plan-found time distributions) for each plan fragment to optimize the allocation of planning and execution time (Elboher et al., 2023).

4. Integration with Perception, Control, and Learning

State-of-the-art PEMs tightly couple with perception pipelines (RGB-D, tactile, external sensor modalities), low-level control (hybrid velocity, force, and visual-servo controllers), and learning algorithms. For example:

In robotics PEMs, sensor streams inform the estimation of feasible disassembly spaces $W_D^{(j)}$ , object pose estimation, and local skill primitives with admittance or IBVS control laws (Friedrich et al., 25 Aug 2025).
Model-based planners are integrated with function approximation (RBF networks, neural value functions) for scalable adaptation in large state spaces—critical when the true environment departs from the model (Vemula et al., 2020, Vemula et al., 2020).
Skill decomposition and parameterization is performed at runtime, mapping high-level manipulation primitives to sequences of adaptive, sensor-driven control commands (Friedrich et al., 25 Aug 2025).
In LLM-based PEMs, high-level actions are mapped to external tool API calls, code execution, or structured data queries, with the feedback loop maintained through explicit history tracking and policy updates (Chen et al., 23 Apr 2025).

5. Empirical Evaluation and Domains of Application

PEMs are validated across diverse domains:

Paper/Module	Domain	Evaluation Highlights
(Dearden et al., 2013)	Stochastic AI planning	Near-optimal reward with depth-5, policy agreement >99.5%
(Mo et al., 20 May 2025) SPlanner	Mobile GUI agents	+28.8pp success over VLM-only on AndroidWorld; interpretable plans
(Friedrich et al., 25 Aug 2025)	Maintenance robotics	80–100% autonomous success; operation time, behavior counts
(Pan et al., 2024)	Real-world TAMP robotics	100% success (with PEM) in partially groundable tasks; faster, fewer actions than baseline
(Vemula et al., 2020, Vemula et al., 2020)	Model-mismatched robotics	Finite-time guarantees; empirical robustness; asymptotic optimality (CMAX++)
(Chen et al., 23 Apr 2025) GoalAct	LLM-based legal agents	+12.22% success over prior SOTA; ablations quantify module roles

PEMs have demonstrated efficacy in domains characterized by real-world uncertainty, incomplete domain models, dynamic task requirements, and necessity for robust execution strategies.

6. Limitations, Theoretical Results, and Research Directions

PEMs are subject to both computational and practical limitations:

Non-trivial intractability: General PEM metareasoning is NP-hard, and even special cases (e.g., with bounded plan prefix or equal slack) require careful pseudo-polynomial dynamic programming (Elboher et al., 2023).
Model/dependence and abstraction error: Performance critically depends on the quality of learned or hand-crafted heuristics $h(s)$ , the granularity of symbolic abstraction, and the coverage of the training domain (Dearden et al., 2013, Vemula et al., 2020).
Execution-monitoring coverage: Behavior module design, partial grounding semantics, and constraint management must be robust to unexpected environmental deviations, perception noise, and actuator failures (Pan et al., 2024).
Scalability challenges: Large symbolic and continuous spaces require scalable learning-based representations and efficient hybrid planning algorithms (Vemula et al., 2020, Friedrich et al., 25 Aug 2025).

Ongoing research explores:

Adaptive resource allocation (for planning depth, model refinement, or behavior invocation)
More expressive partial observability and stochasticity modeling (POMDP-based PEMs)
Learned symbolic/concrete behavior synthesis and repair
Integrated belief and uncertainty reasoning in both planning and execution (Pan et al., 2024).

7. Synthesis and Outlook

The Planning-Execution Module framework offers a unifying abstraction for systems that must bridge the gap between high-level reasoning and real-time action in complex, dynamic, and imperfectly modeled environments. Its variants—ranging from classical heuristic planners (Dearden et al., 2013), concurrent metareasoning algorithms (Elboher et al., 2023), model-adaptive controllers (Vemula et al., 2020, Vemula et al., 2020), robotic TAMP systems (Pan et al., 2024, Friedrich et al., 25 Aug 2025), to next-generation LLM-driven agents (Chen et al., 23 Apr 2025, Mo et al., 20 May 2025)—demonstrate broad applicability and inspire ongoing innovation in AI and robotics research. The PEM design pattern effectively operationalizes integrated planning, acting, monitoring, and repair, informing the construction of robust, adaptive autonomous systems.