Expressive Plan Languages
- Expressive plan languages are formal systems designed to encode complex, multi-modal plans using control-flow constructs, parameterized actions, and strict constraints.
- They support advanced features like recursion, dynamic error recovery, and stochastic branching to manage uncertain and adaptive planning tasks.
- Their application in AI planning and agent systems enhances plan conciseness, efficiency, and robustness through integrated multi-modal data and decision-theoretic approaches.
An expressive plan language is a formalism or programming system that enables the specification of complex, richly structured, and often multi-modal or stochastic plans for agents, decision systems, or interactive applications. Such languages go beyond classical planning representations by supporting conditionals, loops, recursion, control-flow, rich action schemas, constraints, and—frequently—multi-modal annotations or decision-theoretic constructs. The design of expressive plan languages is central to fields such as AI planning, agent systems, decision theory, and code-based interactive reasoning.
1. Formal Foundations and Syntax
Expressive plan languages are characterized by their capacity to compactly encode plans or strategies of high structural or computational complexity. Syntax varies considerably across systems, but typical elements include:
- Parameterized actions: Actions with objects and typed parameters, often with associated preconditions and effects.
- Control-flow constructs: Conditional branching (
if-then-else), loops (e.g.,for,while), and sequential composition (;operator). - Constraint specification: Logical, temporal, and outcome constraints, such as allergen restrictions or process duration bounds.
- Data and media attachments: Integration of multi-modal data (e.g., images, videos) linked to plan objects or actions.
For example, the R3 language for recipes is defined as the tuple
where are objects (ingredients, tools), are parameterized actions, are temporal relations, are constraints (including failure rules), and are multi-modal attachments. The BNF includes constructs for steps with preconditions, failures, and media annotations (Pallagani et al., 2022).
In the REPL-Plan language for code-augmented planning with LLMs, plans are sequences of Python-like statements and definitions (assignments, control flow, function calls), extended with REPL primitives (act, answer, subtask) and dynamic subplan spawning. Operational semantics are directly tied to REPL interaction and automatic error-driven decomposition (Liu et al., 2024).
POAPS, built for adaptive POMDP-based planning, extends a Lisp-like core with an choose construct for optimized stochastic branching, embedding adaptive choice points that are compiled into POMDP action sets (Lin et al., 2016).
2. Semantics and Execution Models
Expressive plan languages typically provide operational and denotational semantics that enable execution, verification, and optimization.
- State transition models: Execution is modeled as transitions over world states, often parameterized by effects of actions, temporal relations, and (in stochastic settings) transition and observation functions.
- Constraint enforcement: Constraints (e.g., process, allergen, outcome) must hold globally or during plan execution. In R3, any plan violating is invalid (Pallagani et al., 2022).
- Plan composition: Sequential and nested composition enables hierarchical or recursive planning (e.g., function calls, subplans, recursive invocations).
- Advanced constructs: Decision-theoretic semantics, as in POAPS, compile plan structure and expert-written stochastic primitives into POMDPs, enabling belief-state tracking and policy optimization (Lin et al., 2016). The language-based decision framework interprets plans as functions on constructed outcome spaces, mediated by selection functions for underspecified actions (Bjorndahl et al., 2023).
- Stream/dataflow execution: THESEUS executes plans as streaming dataflows, where operators consume and emit data tokens in parallel, supporting maximal concurrency and responsiveness (Barish et al., 2011).
3. Key Language Features and Expressive Power
Expressive plan languages are defined by several crucial dimensions:
| Feature | Example Language / Implementation | Expressivity Role |
|---|---|---|
| Conditional Effects | STRIPS+cond. effects, R3, THESEUS | Parallel state updates, efficient encoding |
| Boolean Preconditions | STRIPS+B, R3, POAPS | Enables compact checks over complex states |
| Recursion/Subplans | THESEUS, POAPS, REPL-Plan | Compactly encode unbounded tasks, hierarchies |
| Rich Data & Media | R3 (multi-modal images/videos) | Enables multi-modal and context-rich queries |
| Error Handling/Failures | R3 (failure rules), REPL-Plan | Adaptive recovery, context-aware assistance |
| Stochastic Branching | POAPS (choose), language-based decisions |
Optimized decision-making under uncertainty |
Conditional effects and Boolean preconditions are strict expressivity enhancers: they cannot be compiled away without super-linear increase in plan size under the "compilability" framework (Nebel, 2011).
4. Case Studies and Representative Languages
R3 for recipes encodes the full process of food preparation in a PDDL-like graph, enriched with knowledge for allergen-aware substitutions, multi-modal annotations, and failure-tolerant tips. It enables advanced query, matching, and customization functionality beyond plain-text recipes (Pallagani et al., 2022).
POAPS abstracts away POMDP technicalities, letting users write adaptive, recursive planning programs that are compiled into POMDPs. The expressivity is sufficient to encode arbitrary finite POMDPs with factored state spaces, and supports reuse and modularity through expert-supplied primitives (Lin et al., 2016).
REPL-Plan leverages the interactive REPL paradigm for hierarchical, code-driven planning with LLMs. Its Turing-completeness (in a Pythonic subset) and dynamic, error-driven subplan management enable handling of long-horizon tasks and ambiguous or dynamic requirements (Liu et al., 2024).
THESEUS provides a streaming dataflow plan language for software agents, integrating RA, XML, and control operators with streaming recursion and asynchrony. This model supports complex, real-world data and web automation tasks, and delivers multiplicative performance gains over serial or non-streaming approaches (Barish et al., 2011).
Language-based decisions represent sequential actions as language expressions, generalizing the action-to-outcome mapping paradigm of classical decision theory. This framework supports unbounded action sequencing, conditionals, and underspecified effects, with a representation theorem showing equivalence to SEU maximization over constructed states and outcomes subject to selection functions (Bjorndahl et al., 2023).
5. Expressivity Hierarchies and Compilability
Expressive plan languages are formally compared using compilation schemes, which measure how concisely different planning problems can be specified across formalisms (Nebel, 2011). Notable results include:
- Conditional effects (C) strictly increase the conciseness of encodings: they cannot be compiled away while keeping plan size linear.
- Boolean formulae in preconditions (B) further increase expressivity and cannot be linearly compiled into conditional effects.
- Features such as partial states or negation in preconditions can be compiled away with only constant or linear plan-size overhead.
Consequently, planners with broader native expressivity can subsume narrower languages but only at potentially high cost in plan length and search complexity.
6. Multi-modal, Adaptive, and Interactive Plan Languages
Modern expressive plan languages are increasingly multi-modal, interactive, and adaptive:
- Multi-modality: R3 attaches images or videos at the object/action level, enabling recipe retrieval using text or image queries. These multi-modal links are leveraged by TREAT to support retrieval, explanation, and user interaction at unprecedented depth (Pallagani et al., 2022).
- Error recovery and failure management: Both R3 and REPL-Plan include explicit constructs for error detection and adaptive tip provision—such as context-aware suggestions when steps fail or intermediate states are undesirable (Pallagani et al., 2022, Liu et al., 2024).
- Interactive decomposition: REPL-Plan allows dynamic subplan spawning at runtime, enabling agents (especially LLM-driven ones) to flexibly decompose tasks and handle unexpected situations interactively (Liu et al., 2024).
7. Empirical and Theoretical Impact
Expressive plan languages have demonstrated both empirical advantage and theoretical import:
- R3, instantiated in TREAT, achieves capabilities such as querying over plan structure, enforcing dietary or allergen preferences, and supporting image-based reasoning not possible with purely textual recipe databases (Pallagani et al., 2022).
- REPL-Plan achieves state-of-the-art performance on long-horizon web automation and interaction tasks, outperforming or matching prior systems and demonstrating the critical importance of hierarchical code expressivity and dynamic error recovery (Liu et al., 2024).
- THESEUS's plan language and executor achieve up to speedup over non-streaming baselines on information-gathering benchmarks, while handling tasks inexpressible in standard network query languages (Barish et al., 2011).
- Compilability results clarify when language features are mandatory for conciseness, and when attempted simulation within weaker languages is impractical due to super-linear plan blowup (Nebel, 2011).
The formal and practical impact of expressive plan languages continues to expand, supporting domains from robust agent automation to sophisticated interactive and decision-theoretic applications.