Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GroundNLQ @ Ego4D Natural Language Queries Challenge 2023 (2306.15255v1)

Published 27 Jun 2023 in cs.CV and cs.CL

Abstract: In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data. In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multi-scale grounding module for effective video and text fusion and various temporal intervals, especially for long videos. On the blind test set, GroundNLQ achieves 25.67 and 18.18 for R1@IoU=0.3 and R1@IoU=0.5, respectively, and surpasses all other teams by a noticeable margin. Our code will be released at\url{https://github.com/houzhijian/GroundNLQ}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Zhijian Hou (6 papers)
  2. Lei Ji (33 papers)
  3. Difei Gao (32 papers)
  4. Wanjun Zhong (49 papers)
  5. Kun Yan (23 papers)
  6. Chao Li (429 papers)
  7. Wing-Kwong Chan (11 papers)
  8. Chong-Wah Ngo (55 papers)
  9. Nan Duan (172 papers)
  10. Mike Zheng Shou (165 papers)
Citations (13)
Github Logo Streamline Icon: https://streamlinehq.com