LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking (2104.13202v2)

Published 27 Apr 2021 in cs.CV

Abstract: RGBT tracking receives a surge of interest in the computer vision community, but this research field lacks a large-scale and high-diversity benchmark dataset, which is essential for both the training of deep RGBT trackers and the comprehensive evaluation of RGBT tracking methods. To this end, we present a Large-scale High-diversity benchmark for RGBT tracking (LasHeR) in this work. LasHeR consists of 1224 visible and thermal infrared video pairs with more than 730K frame pairs in total. Each frame pair is spatially aligned and manually annotated with a bounding box, making the dataset well and densely annotated. LasHeR is highly diverse capturing from a broad range of object categories, camera viewpoints, scene complexities and environmental factors across seasons, weathers, day and night. We conduct a comprehensive performance evaluation of 12 RGBT tracking algorithms on the LasHeR dataset and present detailed analysis to clarify the research room in RGBT tracking. In addition, we release the unaligned version of LasHeR to attract the research interest for alignment-free RGBT tracking, which is a more practical task in real-world applications. The datasets and evaluation protocols are available at: https://github.com/BUGPLEASEOUT/LasHeR.

Citations (131)

View on Semantic Scholar

Summary

The paper introduces the LasHeR benchmark, offering 730K+ annotated frame pairs to robustly train and evaluate RGBT tracking models.
It employs fixed and handheld imaging to capture diverse scenarios, including challenging conditions like hyaline occlusion and frame loss.
Evaluation of 12 state-of-the-art trackers demonstrates the benchmark's potential to advance multimodal fusion and real-world tracking performance.

Overview of LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking

The paper presents LasHeR, a comprehensive benchmark dataset aimed at enhancing the field of RGBT tracking by addressing the limitations of existing datasets in terms of scale, diversity, and the number of challenging real-world scenarios. This work builds upon the foundational premise that the combined use of visible and thermal infrared modalities offers unique advantages in tracking tasks, particularly under conditions of poor visibility and temperature crossover.

LasHeR is constructed from 1224 pairs of visible and thermal infrared video sequences, encompassing over 730,000 frame pairs. This substantial dataset is meticulously annotated with bounding boxes to provide high-density ground truth information. The dataset aims to capture a wide array of tracking conditions across different seasons, environments, and times of day, making it significantly diverse compared to prior benchmarks. Additionally, it introduces several new metrics and challenges not thoroughly addressed before, such as hyaline occlusion and frame loss, reflecting real-world tracking challenges like abrupt environmental changes and equipment-specific defects.

Key Contributions:

Scale and Diversity: LasHeR's extensive number of sequences and diverse conditions are expected to enhance the training capabilities of deep learning models specifically tailored for RGBT tracking, which have been constrained by the limited size of existing datasets.
Multi-platform Imaging Setup: The dataset is collected using both fixed and handheld platforms, facilitating varied imaging scenarios and enhancing data heterogeneity. This is a critical step towards developing models capable of robustly handling diverse tracking environments.
Challenge Attributes: The dataset includes numerous annotated attributes across each sequence to explicitly account for diverse tracking challenges, aiming to drive advancements in algorithm robustness and generalizability.
Unaligned LasHeR: The paper introduces an unaligned version of the dataset, encouraging progress in alignment-free RGBT tracking algorithms. This aligns with practical applications where perfect alignment might be infeasible due to hardware or environmental constraints.
Performance Evaluation: Twelve existing RGBT tracking algorithms are evaluated on this benchmark, providing a comprehensive analysis of their performance in light of the new dataset's scope and complexity.

The implications of this research are multifold. Practically, the advancements fostered by LasHeR will likely spur developments in surveillance, autonomous navigation, and other fields where robust tracking across modalities is crucial. Theoretically, the enhanced dataset provides a rich testing ground for novel tracking models and fusion techniques, potentially leading to new insights in computer vision and multimodal data processing.

Future work on LasHeR could explore the integration of additional data modalities or expand the dataset to cover more dynamic and unpredictable environments. The benchmark is also conducive to experiments in novel fusion strategies and architecture designs that can leverage the full extent of the dataset's challenging conditions. Such extensions could pave the way for the development of universally robust and adaptable tracking systems, a critical need as AI systems increasingly operate in complex, real-world settings.

PDF Markdown

Related Papers

GitHub

GitHub - BUGPLEASEOUT/LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking (47 stars)