- The paper presents a comprehensive dataset with over 10,000 video segments and 1.5M annotated bounding boxes, establishing a new standard in diverse object tracking.
- It introduces a one-shot evaluation protocol with zero overlap between training and testing classes to ensure unbiased and robust performance assessments.
- Experiments with 39 state-of-the-art trackers reveal that even top performers achieve an mAO below 0.5, highlighting significant challenges for future research.
GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild
The paper "GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild" presents a comprehensive dataset aimed at addressing several critical issues in the field of object tracking. The introduction of GOT-10k (Generic Object Tracking) sets new standards for diversity, scale, and evaluation protocols.
Dataset Construction
GOT-10k is remarkable for its scale and diversity, containing over 10,000 video segments with 1.5 million meticulously annotated bounding boxes. The dataset spans 563 object classes and 87 motion types, surpassing similar benchmarks like LaSOT and TrackingNet. The dataset's construction benefitted from the hierarchical structure of WordNet, ensuring a broad and unbiased selection of object classes. The elaborate quality control pipeline during both data collection and annotation stages guarantees the reliability and high standard of the dataset.
One-Shot Evaluation Protocol
One of the innovative aspects of GOT-10k is its one-shot evaluation protocol, where there is a zero overlap between object classes used for training and testing, except for the person class. For persons, motion classes between training and testing sets are distinct. This protocol promotes generalization, making the results more indicative of a tracker's performance on previously unseen data. This approach differentiates GOT-10k from other benchmarks where training and test sets heavily overlap, potentially leading to overfitting and biased results.
Numerical Performance
The dataset facilitates the evaluation of a wide range of algorithms. The authors conducted comprehensive experiments with 39 state-of-the-art tracking algorithms. The top-performing trackers included MemTracker with a mean Average Overlap (mAO) of 0.460, and DeepSTRCF, with an mAO of 0.449, highlighting the diversity and challenge embedded in the dataset. Remarkably, no trackers achieved an mAO higher than 0.5, underscoring the complexity and stringent nature of GOT-10k.
Attribute-Specific Challenges
The dataset provides attribute-specific challenges such as occlusion, scale variation, aspect ratio variation, fast motion, illumination variation, and low-resolution targets. The performance of trackers under these varied conditions offers insights into their robustness and adaptability. For instance, trackers displayed notable declines in performance under significant occlusion and fast motion scenarios, signaling areas needing further research and improvement.
Implications and Future Developments
The implications of GOT-10k extend beyond academia to practical applications in surveillance, robotics, and autonomous systems where tracking unseen objects is a frequent requirement. The dataset's expansive nature enables the development of more generalized and robust tracking algorithms.
Looking forward, the challenge remains to enhance trackers to perform competently across all 563 object classes and 87 motion types included in GOT-10k. The dataset paves the way for future research to explore deep learning's potential in creating algorithms that are not only accurate but also highly generalizable and scalable.
Conclusion
In summary, the introduction of GOT-10k marks a significant step forward in the field of object tracking. By addressing the limitations of previous benchmarks, notably through its one-shot evaluation protocol and extensive diversity, GOT-10k offers a robust platform for developing and benchmarking future tracking algorithms. The dataset's availability and comprehensive nature promise substantial contributions to the field, encouraging the development of more adaptive and generalizable tracking solutions.