Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task (2404.08424v2)

Published 12 Apr 2024 in cs.RO, cs.AI, and cs.HC

Abstract: Human intention-based systems enable robots to perceive and interpret user actions to interact with humans and adapt to their behavior proactively. Therefore, intention prediction is pivotal in creating a natural interaction with social robots in human-designed environments. In this paper, we examine using LLMs to infer human intention in a collaborative object categorization task with a physical robot. We propose a novel multimodal approach that integrates user non-verbal cues, like hand gestures, body poses, and facial expressions, with environment states and user verbal cues to predict user intentions in a hierarchical architecture. Our evaluation of five LLMs shows the potential for reasoning about verbal and non-verbal user cues, leveraging their context-understanding and real-world knowledge to support intention prediction while collaborating on a task with a social robot.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (31)

Authors (3)

Hassan Ali (24 papers)
Philipp Allgeuer (33 papers)
Stefan Wermter (157 papers)

Citations (1)

View on Semantic Scholar

Tweets

YouTube

Show All Videos

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task (2404.08424v2)

Related Papers

Tweets

YouTube