Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

2rd Place Solutions in the HC-STVG track of Person in Context Challenge 2021 (2106.07166v1)

Published 14 Jun 2021 in cs.CV

Abstract: In this technical report, we present our solution to localize a spatio-temporal person in an untrimmed video based on a sentence. We achieve the second vIOU(0.30025) in the HC-STVG track of the 3rd Person in Context(PIC) Challenge. Our solution contains three parts: 1) human attributes information is extracted from the sentence, it is helpful to filter out tube proposals in the testing phase and supervise our classifier to learn appearance information in the training phase. 2) we detect humans with YoloV5 and track humans based on the DeepSort framework but replace the original ReID network with FastReID. 3) a visual transformer is used to extract cross-modal representations for localizing a spatio-temporal tube of the target person.

Citations (6)

Summary

We haven't generated a summary for this paper yet.