- The paper demonstrates that LLMs can serve as zero-shot predictors for human behavior in HRI using extensive social datasets.
- It employs benchmark datasets like MANNERS-DB, Trust-Transfer, and SocialIQA to evaluate LLM performance against specialized models.
- The study highlights limitations in spatial reasoning and prompt sensitivity, suggesting the need for integration with additional learning models.
LLMs as Zero-Shot Human Models for Human-Robot Interaction
This essay provides a detailed exploration of the use of LLMs as zero-shot human models in Human-Robot Interaction (HRI), focusing on their capability to function as predictive models without additional training. The research demonstrates both the potential and limitations of LLMs in this domain through empirical studies and practical experiments.
Introduction
The research investigates the application of LLMs, traditionally used in NLP, as human models in HRI. These models, pre-trained on extensive datasets, have not been explicitly configured for human modeling in robotic contexts, yet they exhibit significant promise as zero-shot predictors for human behavior in social interactions.
Figure 1: This study explores the application of LLMs as zero-shot human models in HRI, evaluating their effectiveness with benchmark datasets and demonstrating their use in trust-based scenarios.
Evaluation of LLMs on Social Datasets
Three prominent datasets were utilized to evaluate the predictive capabilities of LLMs: MANNERS-DB, Trust-Transfer, and SocialIQA. These datasets cover different aspects of human social behavior and interaction.
LLMs like FLAN-T5 and a variant of GPT-3.5 were compared to specialized models designed for these tasks under a zero-shot learning framework. Results indicate that LLMs can perform comparably to these specialized models on predictive tasks with no additional training data, albeit with certain limitations.
Figure 2: Datasets and Example Prompts in Prediction Experiments showcasing LLM-based models' application across diversified HRI datasets.
Limitations of LLMs
The study identifies specific limitations of LLMs:
- Spatial and Numerical Reasoning: LLMs show deficiencies in tasks requiring spatial awareness or numerical computations, which compromises their application in HRI tasks requiring such reasoning.
- Prompt Sensitivity: Performance variability based on prompt design highlights that careful formulation is necessary to leverage LLMs effectively.
These weaknesses underscore the need for integrating LLMs with additional learning models that can address spatial and operational reasoning more robustly.
Planning for HRI with LLMs
The study progresses from prediction to the use of LLMs in planning for HRI scenarios, publishing results from two case studies: a table-clearing experiment and a utensil-passing experiment.
- Table-Clearing Experiment: Simulating a trust-oriented task, LLMs were integrated with planners to determine optimal robot actions in a shared environment, effectively replicating prior experimental results with satisfactory performance.
- Utensil-Passing Experiment: Designed to test trust dynamics further, this experiment highlighted LLMs' ability to adjust strategies based on human intervention or trust levels, showcasing their potential in active planning scenarios for HRI.

Figure 3: Utensils and experiment setup demonstrate the physical context in which LLM-based planning was tested.
Figure 4: Illustration of success and intentional failure conditions in the utensil-passing task to mitigate over-trust issues.
Conclusion
The introduction of LLMs as zero-shot human models in HRI settings represents a significant stride toward more adaptive and human-centric robotic systems. Despite their observed limitations in spatial reasoning and sensitivity to prompt design, LLMs offer considerable strength in capturing the latent states and behaviors of humans, crucial for effective HRI.
While LLMs alone may not suffice as comprehensive human models due to their current limitations, their integration with other detailed models offers fertile ground for future research, potentially leading to more sophisticated and contextually aware robotic systems in human environments.