Multimodal PDDL: Hybrid & Temporal Extensions

Updated 30 September 2025

Multimodal PDDL formalization is a framework extending traditional PDDL by integrating discrete, continuous, temporal, numeric, and perceptual constructs for comprehensive planning.
It leverages hybrid automata mappings, durative actions, and process-event semantics to model real-world dynamic systems with both symbolic and quantitative decision-making.
Key challenges include computational complexity, numeric precision, and integrating sensor-driven perception to ensure accurate and robust plan validation.

A multimodal PDDL formalization designates the theoretical and practical framework for extending the Planning Domain Definition Language (PDDL) to systematically model planning domains characterized by multiple interacting modalities: discrete (logical), continuous (dynamical), temporal, numeric, and—recently—perceptual or structural information. This approach encompasses both foundational extensions to the PDDL syntax and semantics (notably PDDL2.1 and PDDL+), and the design of supporting formalisms and interfaces—such as hybrid automata mappings, hierarchical and contingent constructs, logical embeddings, and integration with perception and model-based engineering environments. The core goal is to create a robust, expressive, and analyzable planning substrate capable of representing the complexity of realistic, temporally extended, resource- and process-intensive domains, supporting both symbolic and quantitative decision-making.

1. Foundations: Syntax and Semantics of Multimodal PDDL

Classical PDDL, grounded in STRIPS, models planning domains as sets of logical fluents and instantaneous actions. Multimodal extensions in PDDL2.1 introduced:

Durative Actions: Allowing actions with durations, with preconditions and effects temporally qualified as “at start”, “at end”, or “over all” (Definition 1–27) (Fox et al., 2011).
Numeric and Continuous Effects: Numeric fluents updated either discretely (using assignments) or continuously, e.g., (increase (fuel-level ?p) (* #t (fuel-consumption-rate ?p))), with #t a local action clock for continuous change (Fox et al., 2011).
State–Transition Semantics: States become tuples $(S, \mathbf{v})$ , where $S$ is the logical part and $\mathbf{v}$ is the numeric vector. Continuous effects are formally captured by differential equations—e.g., $d(fc)/dt = g(fc, ...)$ —integrating multiple active process contributions (Fox et al., 2011, Fox et al., 2011).
Concurrency and Validation: The "no moving targets" rule prevents simultaneous access and update of the same fluent by multiple concurrent actions, ensuring semantic determinacy but potentially limiting expressiveness (Fox et al., 2011).

PDDL+ further expanded multimodal expressivity by adding:

Processes: Autonomous, continuously evolving processes modeled as effects over differential equations, outside direct agent control.
Events: Instantaneous, exogenous transitions triggered strictly upon condition satisfaction, ensuring time-slip invariants via additional state variables (Fox et al., 2011).
Semantics via Hybrid Automata: Each PDDL+ instance maps to a hybrid automaton, with discrete locations for logical states and continuous variables for numeric fluents. Actions/events become jump transitions; processes define location flows. Valid plans correspond to accepting traces in the induced labeled transition system (LTS) (Fox et al., 2011).
Decidability Transfer: For restricted hybrid automata subclasses (e.g., Timed Automata), reachability corresponds to plan existence and may be decidable (Fox et al., 2011).

2. Modeling Power and Multimodal Expressivity

PDDL2.1 and PDDL+ support highly expressive modeling across multiple dimensions (“modalities”):

Discrete and Continuous Change: Formalisms can express complex dynamics—stepwise logical effects (e.g., location change), continuous flows (e.g., fuel depletion, temperature increase), and hybrid combinations (flight with concurrent refueling; water heating with additive heat sources) (Fox et al., 2011, Fox et al., 2011).
Temporal and Numeric Constraints: Durations can be fixed or specified by inequalities; simultaneous invariants (via "over all") and temporally-scoped conditional effects are supported (Fox et al., 2011).
Autonomy and Exogeneity: Processes can now model world-driven dynamics, not just agent-initiated actions, and events can encode spontaneous discrete transitions (Fox et al., 2011).
Hierarchical and Hybrid Extensions: HDDL2.1 extends these capabilities to the HTN (Hierarchical Task Network) paradigm, incorporating durative (temporally-extended) methods and explicit temporal/numeric constraints across task decompositions, enabling real-world coordination and concurrency (Pellier et al., 2022).

The table below summarizes core constructs:

Modality	PDDL2.1 Construct	PDDL+ Extension
Discrete	Instantaneous Action	Action, Event
Continuous	Numeric Fluent w/ #t	Process (Differential Flow)
Temporal	Durative Action	Autonomous Process/Event
Exogenous	-	Event
Hierarchical	-	HDDL2.1 Methods

This expressivity enables faithful modeling across domains ranging from planetary landers and robotic manipulation to aerospace manufacturing and real-time resource allocation (Fox et al., 2011, Fox et al., 2011, Pellier et al., 2022).

3. Logical and Hybrid Automata Semantics

PDDL2.1’s semantics are state–transition based, with explicit grounding, normalization, and application rules (Fox et al., 2011). PDDL+ introduces a formal mapping to hybrid automata (Fox et al., 2011):

Discrete Locations: Represent logical state (ground atoms).
Continuous Flows: Governed by vector-valued ODEs for each numeric fluent, subject to invariants and process activation.
Jump Transitions: Instantiated actions/events effect discrete updates; precondition satisfaction triggers transitions instantaneously due to time-slip enforcement.
Plan Validation: Plan traces correspond to time-stamped sequences whose acceptance by the induced LTS verifies plan correctness (Fox et al., 2011).

Recent advances include logical semantics grounding multimodal PDDL+ in a version of the situation calculus (Batusov et al., 2021):

State Evolution Axioms (SEAs): Generalize successor state axioms by encoding continuous evolution as logical temporal fluents, e.g.,

$f(\bar{x}, t, s) = y \Leftrightarrow [\Phi(\bar{x}, y, t, s) \vee (y = f(\bar{x}, start(s), s) \wedge \neg \Psi(\bar{x}, t, s))].$

Mapping to Situation Calculus: Each PDDL+ type/predicate/function/action maps into a corresponding logic construct, facilitating formal reasoning, causal analysis, and theorem proving (Batusov et al., 2021).

This logical/mathematical formalization links multimodal PDDL to established continuous/discrete hybrid systems analysis and verification machinery.

4. Challenges, Validation, and Limitations

Multimodal PDDL formalization presents several open challenges:

Computational Complexity: Validation requires not only checking satisfiability at discrete points but reasoning over temporal intervals—e.g., invariants must be verified over continuous segments, demanding ODE solutions or interval-analysis (Fox et al., 2011). Complexity increases with nonlinear, non-additive updates.
Numeric Precision: Real systems (and validation tools) work with finite-precision arithmetic, requiring fuzzy tolerances in invariant checking and a formal gap between semantics and implementation (Fox et al., 2011).
Concurrency and Safety: Conservative concurrency controls (no moving targets) may limit optimality, but relaxing them risks semantic ambiguity (Fox et al., 2011).
Incompleteness and Sensing: Standard formalisms are extended (via “:unknown-literals”, “:observe”) to enable contingent planning with incomplete knowledge and non-deterministic sensing outcomes (Carreno et al., 2022).
Combinatorics and Grounding: Modeling with complex data types (sets, arrays, records) requires flattening to collections of Boolean variables, often causing grounding explosions that must be mitigated by auxiliary actions and parameter grouping (Elahi et al., 2022).
Vision and Perception Bottlenecks: In multimodal settings involving perceptual input, vision-LLMs often fail to extract a complete and correct set of object relations, hampering reliable formalization (He et al., 25 Sep 2025).

5. Integrations: MBSE, Perception, and Learning

Contemporary directions position multimodal PDDL formalization as a nexus connecting system modeling, perception, and learning-driven symbolic reasoning.

Model-Based Systems Engineering (MBSE): SysML models, enriched with PDDL stereotypes and formal OCL constraints aligned with BNF grammars, enable direct, automated PDDL generation from engineering models, maintaining consistency between design and planning (Nabizada et al., 15 Aug 2024, Nabizada et al., 7 Jun 2025).
Perceptual and Multimodal Formalization: VLMs (Vision LLMs) are deployed as formalizers, translating multi-view images and goals into structured PDDL problem files (object enumeration, initial states, goal conditions) to leverage symbolic planners, with performance currently limited by vision-based relational recall (He et al., 25 Sep 2025).
LLM-Driven Synthesis and Feedback Loops: LLMs are used iteratively, with environment interaction and exploration-walk metrics to refine candidate PDDL files—uniting natural language, simulation feedback, and formal planning in a loop (Mahdavi et al., 17 Jul 2024).
Hierarchical, Temporal, and Data-Type Extensions: HDDL2.1 (HTN with time), situation-calculus embedding, and complex data-type Boolean reductions extend the reach of multimodal PDDL to hierarchical coordination, logical verification, and rich software/system modeling (Pellier et al., 2022, Batusov et al., 2021, Elahi et al., 2022).
Learning Logical Classifiers: First-Order Temporal Logic (FTL) classifiers, learned via MaxSAT on planning traces, yield interpretable multimodal formulas that generalize agent behaviors across structurally diverse domains, supporting formal explanation and verification (Lequen, 13 Oct 2024).

6. Applications and Use Cases

Multimodal PDDL formalisms underlie planning in domains including:

Robotics: Integrated task-motion-belief planning in partially observable settings (e.g., navigating with continuous odometry, battery constraints, and probabilistic state estimation) (Thomas et al., 2019).
Space and Logistics: Models of planetary landers, satellites with variable calibration/visibility intervals, midair refueling, and complex coordination (Fox et al., 2011, Pellier et al., 2022).
Industrial Manufacturing: Automated PDDL derivation within MBSE integrates product and system models (3DExperience, MSOSA), yielding executable plans for tasks such as robotic screwing actions with interchangeable end-effectors (Nabizada et al., 15 Aug 2024, Nabizada et al., 7 Jun 2025).
Contingent and Temporal Planning: TraCE compiles incomplete knowledge and non-deterministic sensing into FOND planning; time-awareness and numeric constraints govern scheduling and resource-limited operations (Carreno et al., 2022).
Formal Verification and Probabilistic Grounding: Automaton-based controllers extracted from pretrained models, grounded and verified with perception, support safety-critical sequential decision-making with probabilistic guarantees (Yang et al., 2023).

7. Prospects and Research Directions

Identified research avenues for multimodal PDDL formalization include:

Integrating Spontaneous Processes: Extending models to support fully autonomous world-triggered (rather than planner-triggered) processes as in PDDL+ “start–process–stop” and further (Fox et al., 2011, Fox et al., 2011).
Improved Plan Validation: Develop automatic tools and hybrid solvers to validate plans over increasingly complex function classes and under finite precision, potentially using “fuzzy” automata or interval analysis (Fox et al., 2011).
Closing the Perception–Formalizer Gap: Enhance VLMs and structured prompting (e.g., Caption-P, SG-P, AP-SG-P (He et al., 25 Sep 2025)) for exhaustive, accurate object-relation extraction and robust mapping from raw observation to symbolic state.
Scalable Automated Synthesis: Beyond LLMs’ zero-shot capabilities, incorporate iterative environment interaction, exploration metrics, and complex example retrieval to refine domain generation and align symbolic planning specifications with real-world semantics (Mahdavi et al., 17 Jul 2024, Huang et al., 17 Sep 2025).
Learning and Generalization: Expand approaches for interpretable, multimodal classifier synthesis—from few examples to broad generalization across instance sizes, problem variants, and data modalities (Lequen, 13 Oct 2024).
Human-in-the-Loop and MBSE-Planner Integration: Involve domain experts directly in annotation and verification of multimodal models, and maintain synchrony between evolving engineering models and PDDL-based automation (Nabizada et al., 15 Aug 2024, Nabizada et al., 7 Jun 2025).

A plausible implication is that as multimodal PDDL formalization techniques mature across these interconnected axes, they will enable broader and more reliable deployment of AI planning systems capable of both symbolic rigor and real-world adaptability in complex, temporally and numerically rich environments.