Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Progressive Domain Adaptation for Thermal Infrared Object Tracking (2407.19430v2)

Published 28 Jul 2024 in cs.CV

Abstract: Due to the lack of large-scale labeled Thermal InfraRed (TIR) training datasets, most existing TIR trackers are trained directly on RGB datasets. However, tracking methods trained on RGB datasets suffer a significant drop-off in TIR data due to the domain shift issue. To this end, in this work, we propose a Progressive Domain Adaptation framework for TIR Tracking (PDAT), which transfers useful knowledge learned from RGB tracking to TIR tracking. The framework makes full use of large-scale labeled RGB datasets without requiring time-consuming and labor-intensive labeling of large-scale TIR data. Specifically, we first propose an adversarial-based global domain adaptation module to reduce domain gap on the feature level coarsely. Second, we design a clustering-based subdomain adaptation method to further align the feature distributions of the RGB and TIR datasets finely. These two domain adaptation modules gradually eliminate the discrepancy between the two domains, and thus learn domain-invariant fine-grained features through progressive training. Additionally, we collect a largescale TIR dataset with over 1.48 million unlabeled TIR images for training the proposed domain adaptation framework. Experimental results on five TIR tracking benchmarks show that the proposed method gains a nearly 6% success rate, demonstrating its effectiveness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Qiao Li (51 papers)
  2. Kanlun Tan (1 paper)
  3. Qiao Liu (42 papers)
  4. Di Yuan (70 papers)
  5. Xin Li (980 papers)
  6. Yunpeng Liu (55 papers)

Summary

Progressive Domain Adaptation for Thermal Infrared Object Tracking

This paper introduces a novel Progressive Domain Adaptation framework designated as PDAT, specifically designed for Thermal InfraRed (TIR) object tracking. The motivation for this framework stems from the significant discrepancies between RGB and TIR datasets, which present challenges in leveraging RGB-trained models for effective TIR tracking. Due to substantial domain shifts, as well as the absence of large-scale labeled TIR datasets, existing methods have struggled to perform well when directly applicable to TIR contexts. PDAT seeks to bridge this gap by capitalizing on the large-scale labeled RGB datasets and adapting them for use in TIR without the necessity for manually labeled TIR data.

Methodology

The PDAT framework is comprised of three main components:

  1. Adversarial Global Domain Adaptation (AGDA): This module employs an adversarial learning strategy to perform global feature alignment between RGB and TIR image domains, thereby reducing domain discrepancies on a coarse level. By using a discriminator within a generative adversarial network (GAN) setup, deep features from TIR images are adapted to resemble those learned from RGB data.
  2. Clustering-Based Subdomain Adaptation (CSDA): Recognizing the insufficiency of global alignment for tasks requiring fine-grained features, this module achieves subdomain adaptation based on clustering mechanisms. It aligns RGB and TIR feature distributions at a finer granularity, promoting the recognition of nuanced class-level distinctions necessary for precise tracking capabilities.
  3. Segment Anything Model (SAM) based preprocessing: SAM is used to generate vast pseudo-labeled TIR training data to act as source samples for domain adaptation, which helps bypass the costly requirement of large-scale TIR annotations.

Experimental Evaluation

The authors conduct extensive evaluations using several TIR tracking benchmarks, including LSOTB-TIR100, LSOTB-TIR120, PTB-TIR, VTUAV, and VOT-TIR2017. The method proposed in this paper reveals a nearly 6% improvement in success rates over competing methods, highlighting its effectiveness. Success in these benchmarks illustrates the proficiency of PDAT in aligning domain-invariant features, adjusting them progressively and precisely from a general RGB domain to the specific needs of TIR tracking.

Implications and Future Contributions

The implications of PDAT are significant both practically and theoretically. By effectively transferring knowledge from labeled RGB datasets to unlabeled TIR contexts, PDAT reduces the dependency on extensive manual labeling, which is a critical bottleneck in TIR applications. This has substantial benefits in fields like autonomous driving and surveillance systems where TIR sensors are prominent.

Theoretically, this paper delineates how domain adaptation methodologies can be structured progressively to provide hierarchical layered adaptations, cushioning the transfer learning process and making it more robust against various domain drifts. In the future, beyond extending PDAT to other sensory modalities or places of application, practitioners and researchers could explore adaptive frameworks that further refine cross-domain feature mapping strategies employing hierarchical clustering algorithms and advanced style transfer techniques to improve upon what PDAT has established.

In conclusion, this work proposes a meticulously structured strategy that expands the feasible applications of deep learning models by addressing and accounting for domain-specific challenges in the field of TIR tracking. As the landscape of artificial intelligence dynamically adjusts to accommodate more challenging environmental data, such approaches correctly position themselves as essential innovations.

Youtube Logo Streamline Icon: https://streamlinehq.com