Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 186 tok/s Pro
GPT OSS 120B 446 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

MMPlanner: Zero-Shot Multimodal Procedural Planning with Chain-of-Thought Object State Reasoning (2509.21662v1)

Published 25 Sep 2025 in cs.LG

Abstract: Multimodal Procedural Planning (MPP) aims to generate step-by-step instructions that combine text and images, with the central challenge of preserving object-state consistency across modalities while producing informative plans. Existing approaches often leverage LLMs to refine textual steps; however, visual object-state alignment and systematic evaluation are largely underexplored. We present MMPlanner, a zero-shot MPP framework that introduces Object State Reasoning Chain-of-Thought (OSR-CoT) prompting to explicitly model object-state transitions and generate accurate multimodal plans. To assess plan quality, we design LLM-as-a-judge protocols for planning accuracy and cross-modal alignment, and further propose a visual step-reordering task to measure temporal coherence. Experiments on RECIPEPLAN and WIKIPLAN show that MMPlanner achieves state-of-the-art performance, improving textual planning by +6.8%, cross-modal alignment by +11.9%, and visual step ordering by +26.7%

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.