Long-term Tracking in the Wild: A Benchmark (1803.09502v3)

Published 26 Mar 2018 in cs.CV

Abstract: We introduce the OxUvA dataset and benchmark for evaluating single-object tracking algorithms. Benchmarks have enabled great strides in the field of object tracking by defining standardized evaluations on large sets of diverse videos. However, these works have focused exclusively on sequences that are just tens of seconds in length and in which the target is always visible. Consequently, most researchers have designed methods tailored to this "short-term" scenario, which is poorly representative of practitioners' needs. Aiming to address this disparity, we compile a long-term, large-scale tracking dataset of sequences with average length greater than two minutes and with frequent target object disappearance. The OxUvA dataset is much larger than the object tracking datasets of recent years: it comprises 366 sequences spanning 14 hours of video. We assess the performance of several algorithms, considering both the ability to locate the target and to determine whether it is present or absent. Our goal is to offer the community a large and diverse benchmark to enable the design and evaluation of tracking methods ready to be used "in the wild". The project website is http://oxuva.net

Citations (160)

View on Semantic Scholar

Summary

The paper presents the OxUvA benchmark that evaluates tracking algorithms in long-term scenarios with frequent target disappearance.
It employs a comprehensive evaluation framework, including true positive and true negative metrics, to challenge conventional short-term trackers.
The study compares multiple trackers and motivates the advancement of resilient methods for continuous object tracking in real-world conditions.

Long-term Visual Object Tracking: The OxUvA Benchmark

The paper "Long-term Tracking in the Wild: A Benchmark" presents an innovative approach to visual object tracking by introducing the OxUvA dataset, a benchmark developed to evaluate single-object tracking algorithms over extended durations and varied conditions. Current tracking methodologies often cater to "short-term" scenarios where sequences are short, and the target remains visible throughout, thereby not reflecting real-world application needs where targets may disappear or move out of frame. To rectify this gap, the authors offer a dataset emphasizing long sequences with an average length of over two minutes and frequent target disappearance, encompassing 366 sequences amounting to 14 hours of video.

Dataset Compilation and Evaluation

The OxUvA dataset distinguishes itself with several key attributes vis-à-vis existing benchmarks, with a substantial duration of sequences and incorporation of sequence elements where the target might not always be visible. This expanded canvas comprises a diverse set of sequences, enhancing its applicability "in the wild." By delineating a split between development and test data and employing a comprehensive evaluation matrix, including true positive and true negative rate calculations, the dataset offers nuanced insights into a method’s viability when targets become non-apparent.

Methodological Expansion

By harnessing the OxUvA benchmark, researchers are tasked with transcending the traditional evaluation of trackers that simply focus on local neighborhood searches. Instead, they must reimagine methodologies which seamlessly integrate capabilities to identify absence and re-detect targets effectively while maintaining efficiency over long durations. The authors develop a comparative framework understanding a tracker’s dual capacity, both in terms of localization accuracy and the innovative measure—determining object presence or absence.

Comparative Analysis with Existing Trackers

Several contemporary trackers are put to the test against the OxUvA benchmark to gauge their robustness in a long-term tracking setup. Notably, trackers like SiamFC, TLD, and MDNet exude competitive performance when adapting to the constraints of object disappearance and re-appearance. The paper reveals these systems are either explicitly influenced or unintentionally degrade when the sequences extend beyond the constraints the trackers were originally designed for. This benchmarking elucidates the opportunity and necessity for refined trackers capable of adapting to the variable tempo and challenges posed by long-term sequences.

Practical and Theoretical Implications

The implications of the OxUvA benchmark are twofold: practically, it aligns tracking algorithms to suit real-world conditions where targets exhibit erratic appearances due to occlusions or frame exits; theoretically, it sets a new paradigm prompting algorithmic advancements towards more resilient and generalized tracking solutions. Moving forward, researchers are encouraged to relax short-term assumptions prevalent in earlier benchmarks while developing methods that address the nuances of continuous operation without compromising accuracy due to extensive tracking duration.

Future Outlook

The OxUvA dataset stands to influence how future tracking algorithms are shaped, with a clarion call for methods adept in handling long-term tracking scenarios. Algorithms need to evolve beyond mere detection to innovative approaches that consider re-detection, persistent tracking, and intelligent absence prediction. In terms of advancement, the benchmark propels the community's collective efforts to bridge theory with practice, ensuring that algorithms not only perform under ideal conditions but adaptively cater to changing dynamics observed in practical deployment. This paper, thus, punctuates a pivotal shift in single-object tracking, navigating the course for future innovations in the discipline.