Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Proactive Human-Robot Interaction using Visuo-Lingual Transformers (2310.02506v1)

Published 4 Oct 2023 in cs.RO and cs.AI

Abstract: Humans possess the innate ability to extract latent visuo-lingual cues to infer context through human interaction. During collaboration, this enables proactive prediction of the underlying intention of a series of tasks. In contrast, robotic agents collaborating with humans naively follow elementary instructions to complete tasks or use specific hand-crafted triggers to initiate proactive collaboration when working towards the completion of a goal. Endowing such robots with the ability to reason about the end goal and proactively suggest intermediate tasks will engender a much more intuitive method for human-robot collaboration. To this end, we propose a learning-based method that uses visual cues from the scene, lingual commands from a user and knowledge of prior object-object interaction to identify and proactively predict the underlying goal the user intends to achieve. Specifically, we propose ViLing-MMT, a vision-language multimodal transformer-based architecture that captures inter and intra-modal dependencies to provide accurate scene descriptions and proactively suggest tasks where applicable. We evaluate our proposed model in simulation and real-world scenarios.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Pranay Mathur (7 papers)