Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living (2406.09390v2)

Published 13 Jun 2024 in cs.CV and cs.LG

Abstract: Current Large Language Vision Models (LLVMs) trained on web videos perform well in general video understanding but struggle with fine-grained details, complex human-object interactions (HOI), and view-invariant representation learning essential for Activities of Daily Living (ADL). This limitation stems from a lack of specialized ADL video instruction-tuning datasets and insufficient modality integration to capture discriminative action representations. To address this, we propose a semi-automated framework for curating ADL datasets, creating ADL-X, a multiview, multimodal RGBS instruction-tuning dataset. Additionally, we introduce LLAVIDAL, an LLVM integrating videos, 3D skeletons, and HOIs to model ADL's complex spatiotemporal relationships. For training LLAVIDAL a simple joint alignment of all modalities yields suboptimal results; thus, we propose a Multimodal Progressive (MMPro) training strategy, incorporating modalities in stages following a curriculum. We also establish ADL MCQ and video description benchmarks to assess LLVM performance in ADL tasks. Trained on ADL-X, LLAVIDAL achieves state-of-the-art performance across ADL benchmarks. Code and data will be made publicly available at: https://adl-x.github.io/.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Rajatsubhra Chakraborty (4 papers)
  2. Arkaprava Sinha (3 papers)
  3. Dominick Reilly (8 papers)
  4. Manish Kumar Govind (3 papers)
  5. Pu Wang (83 papers)
  6. Francois Bremond (114 papers)
  7. Srijan Das (35 papers)
  8. Le Xue (23 papers)
Citations (2)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com