Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LaMI: Large Language Models for Multi-Modal Human-Robot Interaction (2401.15174v4)

Published 26 Jan 2024 in cs.RO and cs.HC

Abstract: This paper presents an innovative LLM-based robotic system for enhancing multi-modal human-robot interaction (HRI). Traditional HRI systems relied on complex designs for intent estimation, reasoning, and behavior generation, which were resource-intensive. In contrast, our system empowers researchers and practitioners to regulate robot behavior through three key aspects: providing high-level linguistic guidance, creating "atomic actions" and expressions the robot can use, and offering a set of examples. Implemented on a physical robot, it demonstrates proficiency in adapting to multi-modal inputs and determining the appropriate manner of action to assist humans with its arms, following researchers' defined guidelines. Simultaneously, it coordinates the robot's lid, neck, and ear movements with speech output to produce dynamic, multi-modal expressions. This showcases the system's potential to revolutionize HRI by shifting from conventional, manual state-and-flow design methods to an intuitive, guidance-based, and example-driven approach. Supplementary material can be found at https://hri-eu.github.io/Lami/

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chao Wang (555 papers)
  2. Stephan Hasler (11 papers)
  3. Daniel Tanneberg (16 papers)
  4. Felix Ocker (10 papers)
  5. Frank Joublin (11 papers)
  6. Antonello Ceravola (11 papers)
  7. Joerg Deigmoeller (7 papers)
  8. Michael Gienger (33 papers)
Citations (12)