Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EgoEnv: Human-centric environment representations from egocentric video (2207.11365v3)

Published 22 Jul 2022 in cs.CV

Abstract: First-person video highlights a camera-wearer's activities in the context of their persistent environment. However, current video understanding approaches reason over visual features from short video clips that are detached from the underlying physical space and capture only what is immediately visible. To facilitate human-centric environment understanding, we present an approach that links egocentric video and the environment by learning representations that are predictive of the camera-wearer's (potentially unseen) local surroundings. We train such models using videos from agents in simulated 3D environments where the environment is fully observable, and test them on human-captured real-world videos from unseen environments. On two human-centric video tasks, we show that models equipped with our environment-aware features consistently outperform their counterparts with traditional clip features. Moreover, despite being trained exclusively on simulated videos, our approach successfully handles real-world videos from HouseTours and Ego4D, and achieves state-of-the-art results on the Ego4D NLQ challenge. Project page: https://vision.cs.utexas.edu/projects/ego-env/

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tushar Nagarajan (33 papers)
  2. Santhosh Kumar Ramakrishnan (12 papers)
  3. Ruta Desai (21 papers)
  4. James Hillis (3 papers)
  5. Kristen Grauman (136 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.