Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Estimating more camera poses for ego-centric videos is essential for VQ3D (2211.10284v1)

Published 18 Nov 2022 in cs.CV

Abstract: Visual queries 3D localization (VQ3D) is a task in the Ego4D Episodic Memory Benchmark. Given an egocentric video, the goal is to answer queries of the form "Where did I last see object X?", where the query object X is specified as a static image, and the answer should be a 3D displacement vector pointing to object X. However, current techniques use naive ways to estimate the camera poses of video frames, resulting in a low query with pose (QwP) ratio, thus a poor overall success rate. We design a new pipeline for the challenging egocentric video camera pose estimation problem in our work. Moreover, we revisit the current VQ3D framework and optimize it in terms of performance and efficiency. As a result, we get the top-1 overall success rate of 25.8% on VQ3D leaderboard, which is two times better than the 8.7% reported by the baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jinjie Mai (12 papers)
  2. Chen Zhao (249 papers)
  3. Abdullah Hamdi (28 papers)
  4. Silvio Giancola (47 papers)
  5. Bernard Ghanem (256 papers)
Citations (4)