Papers
Topics
Authors
Recent
Search
2000 character limit reached

Simulation-to-Rules (VLMFP)

Updated 6 February 2026
  • Simulation-to-Rules (VLMFP) is a method that translates visual and language inputs into formal, rule-based PDDL representations for automated planning.
  • It employs a dual-model framework where SimVLM encodes scenarios and GenVLM generates and refines PDDL files through iterative feedback.
  • The approach demonstrates robust generalization across unseen visual environments, reducing the need for human input in extracting planning rules.

Simulation-to-Rules (VLMFP) describes a vision–LLM (VLM) guided paradigm for autonomously formalizing visual planning scenarios into symbolic, rule-based representations. The approach is instantiated in the Dual-VLM framework VLMFP, which achieves reliable translation of image- and language-conditioned tasks into Planning Domain Definition Language (PDDL) files, thereby enabling formal symbolic planning directly from visual inputs. This method addresses the challenge of automatically extracting all necessary planning rules, rather than relying solely on problem-specific files that require human input or environment access, and demonstrates generalized planning and simulation capabilities across diverse domains (Hao et al., 3 Oct 2025).

1. Formal Problem Setting and Notation

Simulation-to-Rules (VLMFP) is designed to convert visual and linguistic domain descriptions into executable formal planning rules. A visual planning scenario is specified by:

  • A natural language description ndn_d (defining rules, actions, and constraints)
  • An image ipi_p encoding the spatial configuration (e.g., a grid layout)

The target output consists of:

  • fdf_d: the PDDL domain file (defining predicates, actions, and transition rules)
  • fpf_p: the PDDL problem file (specifying initial state and goal conditions)

Optimal solutions are formal plans π\pi such that a symbolic planner operating on (fd,fp)(f_d,f_p) computes a valid action sequence to achieve the designated goal state. The central technical challenge lies in generating both fdf_d and fpf_p from ndn_d and ipi_p, including accurate symbolic representations of general rules, not just problem instances.

The VLMFP architecture introduces two distinct VLMs:

  • SimVLM: A vision–LLM specialized for simulating environment dynamics, describing scenarios, and judging goal reachability based on current state and action sequences.
  • GenVLM: A generative vision–LLM (e.g., GPT-4o) tasked with producing and refining the PDDL files interactively, leveraging feedback arising from discrepancies between symbolic and simulated outcomes.

2. Architecture and Workflow

VLMFP employs a Dual-VLM workflow orchestrated through four key stages:

  1. Scenario Encoding: SimVLM generates a concise natural language description ipi_p0 summarizing spatial relationships and object configurations derived from ipi_p1 and ipi_p2.
  2. Initial PDDL Generation: GenVLM uses ipi_p3 and ipi_p4 to synthesize initial candidate PDDL files ipi_p5.
  3. Simulation Consistency Check and Feedback Collection: Random action sequences ipi_p6 are sampled; SimVLM reports simulated transition outcomes while the planner executes those sequences on ipi_p7. Discrepancies (i.e., ipi_p8) are collected as feedback for further refinement.
  4. Iterative Refinement: GenVLM receives mismatch feedback and updates the files. This loop continues until either perfect alignment is reached (measured by the EW score, see Section 4), a valid plan is found, or the maximum number of iterations is reached.

Pseudocode for this process, following (Hao et al., 3 Oct 2025), is as follows: fpf_p1

Termination occurs when no mismatches are observed and the symbolic planner can solve the formalized problem, or after a preset iteration cap.

3. Output Representation: PDDL Domain and Problem Files

The PDDL domain file ipi_p9 encodes type signatures, predicates, and action schemas, whereas the problem file fdf_d0 introduces instance-specific objects, initial state predicates, and goal formulations.

An illustrative example for the "FrozenLake" environment is as follows:

  • Domain file (fdf_d1):

fpf_p2

  • Problem file (fdf_d2):

fpf_p3

The same domain file fdf_d3 generalizes across all instances for a given problem class, while fdf_d4 is adapted to the particular state configuration generated from the visual input.

4. Evaluation Metrics and Empirical Results

Multiple quantitative indicators are established for benchmarking VLMFP (Hao et al., 3 Oct 2025):

  • SimVLM Metrics:
    • Task description accuracy
    • Execution reason accuracy
    • Execution result accuracy
    • Goal-reaching judgment accuracy
    • For seen and unseen appearances, SimVLM achieves high performance, with rates ranging from 82% to 95.5%.
  • Planning Validity:

The success rate is defined as the proportion of tested instances where the planner, operating on generated fdf_d5, reaches the designated goal. VLMFP with GPT-4o yields a planning validity of 70.0% for seen appearances and 54.1% for unseen ones, markedly superior to CodePDDL baselines (32.3% for unseen cases).

  • EW (Exploration Walk) Score:

EW quantifies the bidirectional agreement between SimVLM simulations and PDDL execution across sampled action sequences:

fdf_d6

where fdf_d7 and fdf_d8 are averages of the expected valid sequence rates under SimVLM and PDDL models, respectively. A high EW score indicates close alignment between simulated and formalized domains.

Metric Seen (%) Unseen (%)
SimVLM TaskDesc 95.5 92.6
SimVLM ExecResult 85.5 87.8
SimVLM GoalReach 82.4 85.6
VLMFP Planning Validity 70.0 54.1
CodePDDL Baseline Validity 30.7 32.3

This demonstrates robust generalization both to novel visual environments and to previously unseen instance configurations.

5. Generalization and System Limitations

The generalization capabilities of Simulation-to-Rules (VLMFP) are evidenced at multiple levels:

  • Visual generalization: SimVLM accuracy remains above 82% when evaluated on domains rendered in unseen visual styles.
  • Rule generalization: Novel FrozenLake rule variants (e.g., teleportation, skip-action after hazard) show reasoning/execution rates for SimVLM between 59–99% in most cases. Notably, specific rules requiring multi-step state resets present systematic challenges—e.g., the skip-action rule shows correct textual reasoning (71%) but only 0% execution fidelity due to state-tracking errors.
  • Scalability: The same fdf_d9 supports all instances within a domain, with fpf_p0 automatically adapted, illustrating broad intra-domain generalization.

Limitations observed include:

  • Occasional omission of required predicates (e.g., directional constraints) in the generated problem file, leading to incomplete or unexecutable plans.
  • In complex domains requiring many object types and intricate preconditions (e.g., Sokoban, Printer), initial generation may lack necessary constraints, with planning accuracy collapsing in the absence of iterative refinement.
  • SimVLM’s state-tracking capacity limits plan fidelity in rules requiring non-local state resets or novel dynamics.

A plausible implication is that while VLMFP can formalize and generalize many classes of planning tasks, reliance on vision–LLMs for environment simulation imposes an upper bound on achievable logical fidelity when rules depart structurally from training data.

VLMFP is positioned directly in response to prior hybrid approaches in neuro-symbolic planning which leverage VLMs to convert visual problems into PDDL for downstream symbolic planning, yet require human-authored domain files or extensive interaction with the environment for verification. Unique to VLMFP is the end-to-end automation of both domain and problem file synthesis and the closure of the formalization loop by iterative simulation-based refinement mediated by a pair of specialized VLMs (Hao et al., 3 Oct 2025).

Simulation-to-Rules stands in contrast to classical model-free data-driven simulation in computational mechanics (Ciftci et al., 2021), where transition rules are constructed from structured (often physical) data and classification of behavioral regimes rather than extracted from visual-linguistic input. The unification of simulation and symbolic formalization within VLMFP reflects a general trend toward integrative, data-efficient neuro-symbolic planning frameworks.

7. Significance and Future Directions

Simulation-to-Rules (VLMFP) demonstrates viable automatic formalization of complex visual planning domains with generalization to new visual styles, object configurations, and varied rule sets. The approach moves beyond instance-specific reasoning, enabling symbolic execution at scale directly from perception data.

Open challenges and future potential include:

  • Enhancing VLM reasoning for rule sets requiring non-local state dependencies and memory of complex transitions
  • Robust extraction of implicit constraints in domains with high combinatorial complexity
  • Scaling to more general classes of planning problems beyond grid-based environments

Continued development of dual-model simulation-to-rule pipelines promises to increase the autonomy, transferability, and transparency of visual-to-symbolic planning systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Simulation-to-Rules (VLMFP).