Improving the LLM Orchestrator for Embodied Robotics

Determine effective methods to improve the large language model-based orchestrator that performs high-level planning and social interaction in hierarchical robotic systems combining an LLM orchestrator with a Vision Language Action (VLA) executor, so that the orchestrator achieves higher practical intelligence on real-world tasks such as those evaluated by Butter-Bench.

Background

The paper evaluates LLM orchestrators in a hierarchical robotic architecture where the LLM handles high-level reasoning, planning, and social behavior, and a Vision Language Action (VLA) model executes low-level control. Butter-Bench is introduced to assess practical intelligence—including spatial reasoning and social understanding—of the LLM orchestrator independent of the executor.

Empirically, the authors find that Gemini 2.5 Pro outperforms Gemini Robotics Embodied Reasoning 1.5 (Gemini ER 1.5) on Butter-Bench, suggesting that current embodied fine-tuning approaches do not significantly improve practical intelligence or social capabilities. This leads to the explicit open question of how to improve the orchestrator, with the authors noting that training on the type of robotics data used for Gemini ER did not yield better performance, and proposing real-world deployments as one potential data collection path for social behavior.

References

How to improve the orchestrator remains a question for future research, but the fact that Gemini ER 1.5 is not better than Gemini 2.5 Pro suggests social capabilities are not improved by training on the type of robotic data Gemini ER is trained on.

— Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence (2510.21860 - Sharrock et al., 23 Oct 2025) in Section Future Work

Improving the LLM Orchestrator for Embodied Robotics

Sponsor

Background

References

Related Problems