Embodiment-aware functional reasoning in VLMs
Determine how to achieve reliable, inclusive embodiment-aware functional reasoning in Vision-Language Models across diverse agent profiles—specifically Adult, Child, and Wheelchair user—so that performance differences across these embodiments do not persist even after task decomposition into atomic actions in 3D scenes.
References
Finally, inclusivity remains non-trivial across all systems, as performance differences across Adult, Child, and Wheelchair user profiles persist even after decomposition, highlighting that embodiment-aware functional reasoning remains an open challenge.
— SceneTeract: Agentic Functional Affordances and VLM Grounding in 3D Scenes
(2603.29798 - Maillard et al., 31 Mar 2026) in Section 4.3, Results Overview