Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LaSOT: A High-quality Large-scale Single Object Tracking Benchmark (2009.03465v3)

Published 8 Sep 2020 in cs.CV

Abstract: Despite great recent advances in visual tracking, its further development, including both algorithm design and evaluation, is limited due to lack of dedicated large-scale benchmarks. To address this problem, we present LaSOT, a high-quality Large-scale Single Object Tracking benchmark. LaSOT contains a diverse selection of 85 object classes, and offers 1,550 totaling more than 3.87 million frames. Each video frame is carefully and manually annotated with a bounding box. This makes LaSOT, to our knowledge, the largest densely annotated tracking benchmark. Our goal in releasing LaSOT is to provide a dedicated high quality platform for both training and evaluation of trackers. The average video length of LaSOT is around 2,500 frames, where each video contains various challenge factors that exist in real world video footage,such as the targets disappearing and re-appearing. These longer video lengths allow for the assessment of long-term trackers. To take advantage of the close connection between visual appearance and natural language, we provide language specification for each video in LaSOT. We believe such additions will allow for future research to use linguistic features to improve tracking. Two protocols, full-overlap and one-shot, are designated for flexible assessment of trackers. We extensively evaluate 48 baseline trackers on LaSOT with in-depth analysis, and results reveal that there still exists significant room for improvement. The complete benchmark, tracking results as well as analysis are available at http://vision.cs.stonybrook.edu/~lasot/.

Citations (181)

Summary

  • The paper presents LaSOT, a high-quality benchmark featuring 1,550 videos and over 3.87 million meticulously annotated frames.
  • It introduces two evaluation protocols—full overlap and one-shot—to rigorously assess tracker performance under diverse real-world challenges.
  • Findings indicate that leveraging deep feature representations and online model update strategies are key to achieving robust, long-term tracking performance.

Overview of LaSOT: A High-quality Large-scale Single Object Tracking Benchmark

The paper introduces LaSOT, a comprehensive benchmark designed for large-scale visual tracking evaluation, addressing significant limitations present in existing tracking benchmarks. LaSOT is constructed to advance both algorithm design and evaluation in single object tracking by providing a substantial dataset consisting of 1,550 videos with over 3.87 million frames that span 85 diverse object categories. Each frame is intricately annotated, ensuring high-quality dense annotation, thus providing an extensive platform for both training and evaluating tracking algorithms.

Key Elements and Contributions

LaSOT presents several standout features:

  1. Scale and Quality: With a total of 1,550 videos, comprising 85 object classes, and meticulous frame-by-frame annotation, LaSOT stands as the largest, densest tracking benchmark available. Such scale is intended to facilitate both robust training and comprehensive evaluation of tracking algorithms.
  2. Challenge Factors: The benchmark includes a variety of real-world challenges such as occlusions, scale variations, and out-of-view scenarios across long video sequences, with an average length of approximately 2,500 frames. This allows for assessing the algorithms' capabilities in enduring tests not provided by short-term benchmarks, thereby emphasizing long-term tracking capabilities.
  3. Category Balance: In contrast to other benchmarks that suffer from category bias, LaSOT is rigorously balanced to ensure equal representation across all object categories, fostering unbiased algorithmic evaluation.
  4. Natural Language Annotations: In addition to visual data, LaSOT provides linguistic specifications for each video. This enables the exploration of techniques combining visual and linguistic features, potentially enhancing tracking performance through more advanced semantic understanding.
  5. Protocols: LaSOT adopts two distinct evaluation protocols—full overlap and one-shot. The full overlap protocol uses training and testing sets from overlapping categories, whereas the one-shot protocol evaluates unseen categories. These protocols cater to varied application scenarios, such as tracking objects from familiar or rare categories.

Experimental Evaluation

The paper details extensive experimental evaluation on LaSOT, highlighting the performance of 48 state-of-the-art trackers under the given protocols. The results manifest notable observations:

  • Impact of Deep Features: All top-performing trackers leverage deep feature representations, underscoring the importance of deep learning in achieving robust tracking performance, especially under challenging conditions.
  • Model Update: Trackers employing online model update strategies generally exhibit superior performance, indicating that adaptation to appearance changes is critical for maintaining tracking accuracy over time.
  • Protocol Comparisons: Analysis reveals a performance gap between trackers under full overlap and one-shot protocols due to the inability to adapt efficiently to unseen object categories, suggesting future directions in domain adaptation and feature generalization.

Implications and Future Directions

The release of LaSOT is anticipated to advance research in three main directions:

  • Algorithmic Development: The benchmark will stimulate the development of algorithms capable of addressing long-term tracking challenges and facilitate deeper integration of cross-modal data.
  • Evaluative Standards: By providing robust evaluation protocols and a balanced dataset, LaSOT sets a new standard for unbiased performance comparison among tracking algorithms.
  • Generalization and Adaptation: The benchmark encourages exploration into models that can generalize across variable domains or adapt dynamically to unseen categories, thus enhancing real-world application robustness.

LaSOT is a crucial step forward in the visual tracking domain, supplying researchers with the expansive and richly annotated dataset necessary to innovate and refine tracking methodologies. Its high-quality annotations and balanced, diverse data promise to foster significant advances in single object tracking capabilities, both practically and theoretically. Researchers are encouraged to leverage LaSOT for developing adaptive, reliable tracking solutions that stand the test of real-world applications.