GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild (1810.11981v3)

Published 29 Oct 2018 in cs.CV

Abstract: We introduce here a large tracking database that offers an unprecedentedly wide coverage of common moving objects in the wild, called GOT-10k. Specifically, GOT-10k is built upon the backbone of WordNet structure and it populates the majority of over 560 classes of moving objects and 87 motion patterns, magnitudes wider than the most recent similar-scale counterparts. The contributions of this paper are summarized in the following: (1) GOT-10k offers over 10,000 video segments with more than 1.5 million manually labeled bounding boxes, enabling unified training and stable evaluation of deep trackers. (2) GOT-10k is by far the first video trajectory dataset that uses the semantic hierarchy of WordNet to guide class population. (3) For the first time, GOT-10k introduces the one-shot protocol for tracker evaluation, where the training and test classes are zero-overlapped. The protocol avoids biased evaluation results towards familiar objects and it promotes generalization in tracker development. (4) We conduct extensive tracking experiments with 39 typical tracking algorithms on GOT-10k and analyze their results in this paper. (5) Finally, we develop a comprehensive platform for the tracking community that offers full-featured evaluation toolkits, an online evaluation server, and a responsive leaderboard. The annotations of GOT-10k's test data are kept private to avoid tuning parameters on it. The database, toolkits, evaluation server and baseline results are available at http://got-10k.aitestunion.com.

Citations (1,211)

View on Semantic Scholar

Summary

The paper presents a comprehensive dataset with over 10,000 video segments and 1.5M annotated bounding boxes, establishing a new standard in diverse object tracking.
It introduces a one-shot evaluation protocol with zero overlap between training and testing classes to ensure unbiased and robust performance assessments.
Experiments with 39 state-of-the-art trackers reveal that even top performers achieve an mAO below 0.5, highlighting significant challenges for future research.

GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild

The paper "GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild" presents a comprehensive dataset aimed at addressing several critical issues in the field of object tracking. The introduction of GOT-10k (Generic Object Tracking) sets new standards for diversity, scale, and evaluation protocols.

Dataset Construction

GOT-10k is remarkable for its scale and diversity, containing over 10,000 video segments with 1.5 million meticulously annotated bounding boxes. The dataset spans 563 object classes and 87 motion types, surpassing similar benchmarks like LaSOT and TrackingNet. The dataset's construction benefitted from the hierarchical structure of WordNet, ensuring a broad and unbiased selection of object classes. The elaborate quality control pipeline during both data collection and annotation stages guarantees the reliability and high standard of the dataset.

One-Shot Evaluation Protocol

One of the innovative aspects of GOT-10k is its one-shot evaluation protocol, where there is a zero overlap between object classes used for training and testing, except for the person class. For persons, motion classes between training and testing sets are distinct. This protocol promotes generalization, making the results more indicative of a tracker's performance on previously unseen data. This approach differentiates GOT-10k from other benchmarks where training and test sets heavily overlap, potentially leading to overfitting and biased results.

Numerical Performance

The dataset facilitates the evaluation of a wide range of algorithms. The authors conducted comprehensive experiments with 39 state-of-the-art tracking algorithms. The top-performing trackers included MemTracker with a mean Average Overlap (mAO) of 0.460, and DeepSTRCF, with an mAO of 0.449, highlighting the diversity and challenge embedded in the dataset. Remarkably, no trackers achieved an mAO higher than 0.5, underscoring the complexity and stringent nature of GOT-10k.

Attribute-Specific Challenges

The dataset provides attribute-specific challenges such as occlusion, scale variation, aspect ratio variation, fast motion, illumination variation, and low-resolution targets. The performance of trackers under these varied conditions offers insights into their robustness and adaptability. For instance, trackers displayed notable declines in performance under significant occlusion and fast motion scenarios, signaling areas needing further research and improvement.

Implications and Future Developments

The implications of GOT-10k extend beyond academia to practical applications in surveillance, robotics, and autonomous systems where tracking unseen objects is a frequent requirement. The dataset's expansive nature enables the development of more generalized and robust tracking algorithms.

Looking forward, the challenge remains to enhance trackers to perform competently across all 563 object classes and 87 motion types included in GOT-10k. The dataset paves the way for future research to explore deep learning's potential in creating algorithms that are not only accurate but also highly generalizable and scalable.

Conclusion

In summary, the introduction of GOT-10k marks a significant step forward in the field of object tracking. By addressing the limitations of previous benchmarks, notably through its one-shot evaluation protocol and extensive diversity, GOT-10k offers a robust platform for developing and benchmarking future tracking algorithms. The dataset's availability and comprehensive nature promise substantial contributions to the field, encouraging the development of more adaptive and generalizable tracking solutions.

PDF Markdown