Tracking with Human-Intent Reasoning (2312.17448v1)

Published 29 Dec 2023 in cs.CV

Abstract: Advances in perception modeling have significantly improved the performance of object tracking. However, the current methods for specifying the target object in the initial frame are either by 1) using a box or mask template, or by 2) providing an explicit language description. These manners are cumbersome and do not allow the tracker to have self-reasoning ability. Therefore, this work proposes a new tracking task -- Instruction Tracking, which involves providing implicit tracking instructions that require the trackers to perform tracking automatically in video frames. To achieve this, we investigate the integration of knowledge and reasoning capabilities from a Large Vision-LLM (LVLM) for object tracking. Specifically, we propose a tracker called TrackGPT, which is capable of performing complex reasoning-based tracking. TrackGPT first uses LVLM to understand tracking instructions and condense the cues of what target to track into referring embeddings. The perception component then generates the tracking results based on the embeddings. To evaluate the performance of TrackGPT, we construct an instruction tracking benchmark called InsTrack, which contains over one thousand instruction-video pairs for instruction tuning and evaluation. Experiments show that TrackGPT achieves competitive performance on referring video object segmentation benchmarks, such as getting a new state-of the-art performance of 66.5 $\mathcal{J}&\mathcal{F}$ on Refer-DAVIS. It also demonstrates a superior performance of instruction tracking under new evaluation protocols. The code and models are available at \href{https://github.com/jiawen-zhu/TrackGPT}{https://github.com/jiawen-zhu/TrackGPT}.

References (62)

Authors (8)

Jiawen Zhu (30 papers)
Zhi-Qi Cheng (61 papers)
Jun-Yan He (27 papers)
Chenyang Li (71 papers)
Bin Luo (209 papers)
Huchuan Lu (199 papers)
Yifeng Geng (30 papers)
Xuansong Xie (69 papers)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - jiawen-zhu/TrackGPT: Tracking with Human-Intent Reasoning (64 stars)

Tweets

https://twitter.com/fly51fly/status/1744118521423110332

Tracking with Human-Intent Reasoning (2312.17448v1)

Summary

Related Papers

GitHub

Tweets