Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding (2406.13807v2)

Published 19 Jun 2024 in cs.CV, cs.AI, and cs.CL

Abstract: AI personal assistants deployed via robots or wearables require embodied understanding to collaborate with humans effectively. However, current Vision-LLMs (VLMs) primarily focus on third-person view videos, neglecting the richness of egocentric perceptual experience. To address this gap, we propose three key contributions. First, we introduce the Egocentric Video Understanding Dataset (EVUD) for training VLMs on video captioning and question answering tasks specific to egocentric videos. Second, we present AlanaVLM, a 7B parameter VLM trained using parameter-efficient methods on EVUD. Finally, we evaluate AlanaVLM's capabilities on OpenEQA, a challenging benchmark for embodied video question answering. Our model achieves state-of-the-art performance, outperforming open-source models including strong Socratic models using GPT-4 as a planner by 3.6%. Additionally, we outperform Claude 3 and Gemini Pro Vision 1.0 and showcase competitive results compared to Gemini Pro 1.5 and GPT-4V, even surpassing the latter in spatial reasoning. This research paves the way for building efficient VLMs that can be deployed in robots or wearables, leveraging embodied video understanding to collaborate seamlessly with humans in everyday tasks, contributing to the next generation of Embodied AI.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Alessandro Suglia (25 papers)
  2. Claudio Greco (5 papers)
  3. Katie Baker (1 paper)
  4. Jose L. Part (4 papers)
  5. Arash Eshghi (23 papers)
  6. Ioannis Konstas (40 papers)
  7. Oliver Lemon (39 papers)
  8. Ioannis Papaioannou (8 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com