Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RGB-T Object Tracking:Benchmark and Baseline (1805.08982v1)

Published 23 May 2018 in cs.CV

Abstract: RGB-Thermal (RGB-T) object tracking receives more and more attention due to the strongly complementary benefits of thermal information to visible data. However, RGB-T research is limited by lacking a comprehensive evaluation platform. In this paper, we propose a large-scale video benchmark dataset for RGB-T tracking.It has three major advantages over existing ones: 1) Its size is sufficiently large for large-scale performance evaluation (total frame number: 234K, maximum frame per sequence: 8K). 2) The alignment between RGB-T sequence pairs is highly accurate, which does not need pre- or post-processing. 3) The occlusion levels are annotated for occlusion-sensitive performance analysis of different tracking algorithms.Moreover, we propose a novel graph-based approach to learn a robust object representation for RGB-T tracking. In particular, the tracked object is represented with a graph with image patches as nodes. This graph including graph structure, node weights and edge weights is dynamically learned in a unified ADMM (alternating direction method of multipliers)-based optimization framework, in which the modality weights are also incorporated for adaptive fusion of multiple source data.Extensive experiments on the large-scale dataset are executed to demonstrate the effectiveness of the proposed tracker against other state-of-the-art tracking methods. We also provide new insights and potential research directions to the field of RGB-T object tracking.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Chenglong Li (94 papers)
  2. Xinyan Liang (2 papers)
  3. Yijuan Lu (11 papers)
  4. Nan Zhao (79 papers)
  5. Jin Tang (139 papers)
Citations (372)

Summary

  • The paper presents a comprehensive RGBT234 dataset with 234K frames and detailed occlusion annotations for improved tracking evaluation.
  • The paper introduces a novel graph-based algorithm that uses ADMM optimization to adaptively fuse RGB and thermal modalities.
  • Experimental results show that the proposed method significantly enhances precision and robustness over traditional RGB-T tracking approaches.

Overview of "RGB-T Object Tracking: Benchmark and Baseline"

This paper addresses the RGB-T object tracking problem by introducing a comprehensive benchmark dataset and proposing a novel graph-based approach for robust object representation. RGB-T tracking leverages the complementary information from visible (RGB) and thermal spectrum data, proving beneficial under various environmental conditions that hinder traditional RGB tracking.

The paper presents the RGBT234 dataset, which consists of 234K frames across 234 RGB-T video sequence pairs, significantly larger than previously available datasets. The dataset outperforms existing ones in terms of size, alignment accuracy, and comprehensive occlusion-level annotations, facilitating occlusion-sensitive performance evaluation of tracking algorithms. The annotated attributes span challenges like low illumination, resolution issues, and occlusions, enriching the dataset's applicability to real-world scenarios.

On the methodological front, the paper introduces a graph-based algorithm that dynamically learns an object representation using a robust ADMM-based optimization framework. This approach involves constructing a graph where nodes represent image patches, and the graph's structure is optimized to capture inter-patch relationships through modalities. The method integrates modality weights to handle the adaptive fusion of RGB and thermal data sources, aiming for robustness in varying imaging conditions.

The experimental results conducted on the RGBT234 dataset indicate that the proposed method achieves significantly better performance compared to existing RGB-T trackers, notably those employing simple concatenation strategies or unimodal focus. The enhancement in precision and robustness metrics suggests that dynamic patch-based representations and adaptive fusion strategies contribute positively to handling challenges like deformations, occlusions, and challenging weather conditions.

The paper's contributions include:

  1. Introduction of a large and highly-annotated RGB-T dataset (RGBT234) to enhance performance evaluation and benchmarking of RGB-T tracking algorithms.
  2. A novel graph-based RGB-T tracking algorithm that learns robust object representations through dynamic graph construction and adaptive modality weighting.

Implications and Future Directions

The paper presents a significant step forward in RGB-T object tracking, providing both data and methodological advances. The RGBT234 dataset promises to be a valuable resource, fostering further research in the domain by enabling more rigorous comparison of tracking algorithms. The dataset's comprehensive nature promotes exploration into attribute-specific tracking scenarios, which can lead to the development of more specialized models capable of handling particular challenges.

The inclusion of adaptive fusion and representation learning within the tracking framework hints at future adaptations where tracking systems might include additional data modalities. Potentially, the methodologies laid out could evolve to incorporate learnings from neural network-based approaches, marrying the robustness of graph-based methods with the feature representation capabilities of deep learning.

Overall, this paper offers substantial contributions not only to RGB-T tracking but also to multimodal data fusion research, underscoring the broader utility of developing comprehensive datasets and sophisticated data modeling techniques in computer vision. Future research can leverage these insights to explore adaptive, scalable solutions across various tracking and vision tasks.