Tracking-by-Trackers with a Distilled and Reinforced Model (2007.04108v2)

Published 8 Jul 2020 in cs.CV

Abstract: Visual object tracking was generally tackled by reasoning independently on fast processing algorithms, accurate online adaptation methods, and fusion of trackers. In this paper, we unify such goals by proposing a novel tracking methodology that takes advantage of other visual trackers, offline and online. A compact student model is trained via the marriage of knowledge distillation and reinforcement learning. The first allows to transfer and compress tracking knowledge of other trackers. The second enables the learning of evaluation measures which are then exploited online. After learning, the student can be ultimately used to build (i) a very fast single-shot tracker, (ii) a tracker with a simple and effective online adaptation mechanism, (iii) a tracker that performs fusion of other trackers. Extensive validation shows that the proposed algorithms compete with real-time state-of-the-art trackers.

Citations (4)

View on Semantic Scholar

Summary

The paper presents a unified object tracking framework that integrates multiple trackers through knowledge distillation and reinforcement learning.
It employs a compact student model refined via KD and RL, achieving competitive real-time tracking on benchmarks like GOT-10k and UAV123.
The approach balances speed and adaptability by merging offline training with online adaptation, enhancing both accuracy and robustness.

A Novel Unified Approach to Visual Object Tracking: Tracking-by-Trackers with a Distilled and Reinforced Model

The paper introduces an innovative methodology for visual object tracking, which promises to unify fast-processing, online adaptation, and tracker fusion into a singular, cohesive framework. This paper presents a novel approach termed "Tracking-by-Trackers," which leverages multiple visual trackers to enhance tracking performance both offline and online. A compact student model is refined through an intricate process combining knowledge distillation (KD) and reinforcement learning (RL), each serving complementary roles to optimize performance and efficiency.

Key Contributions

The primary contribution of this paper is the integration of a student tracker model that capitalizes on the strengths of other tracking algorithms using KD and RL. The innovative approach involves:

Knowledge Distillation: This process encapsulates the tracking knowledge from multiple off-the-shelf trackers (referred to as 'teachers') into a more compact 'student' model. Distillation aims to compress tracking knowledge, focusing on sharing essential features while maintaining integrity and performance.
Reinforcement Learning: This facet introduces dynamic adaptability, optimizing not only the tracking policy but also evaluation functions used in decision-making processes. RL enhances the framework by directly maximizing overlap between predicted and ground truth bounding boxes.

These approaches enable the student model to function within three different configurations:

TRAS (TRAcking Student): A fast single-shot tracker.
TRAST (TRAcking Student and Teacher): Incorporates simple but effective online adaptation.
TRASFUST (TRAcking by Student FUSing Teachers): Excels in tracker fusion, enhancing robustness and adaptability.

Methodological Innovations and Evaluation

The model employs convolutional neural networks with shared weights and an LSTM for temporal dependencies, using a combination of action prediction and performance evaluation outputs. The framework strategically interleaves the optimization of KD with RL, allowing the student model to learn comprehensive tracking policies from multiple teachers.

The proposed methodologies were put to the test on benchmark datasets such as GOT-10k, UAV123, LaSOT, OTB-100, and VOT2019, showcasing competitive performance against state-of-the-art real-time trackers. Notably, the proposed trackers demonstrated strong efficacy; TRAS is notably fast, while TRAST and TRASFUST offer balanced improvements in robustness and accuracy. The paper highlights the superior performance of TRASFUST, which effectively leads in benchmarks, indicating its proficiency in merging insights from diverse tracking experiences.

Implications and Future Directions

This research has significant practical implications in advancing real-time object tracking, which is crucial for applications in video surveillance, autonomous systems, and robotics. The fusion of KD and RL within the tracking domain represents a new trajectory for research, pushing the boundaries of efficient real-time tracking without sacrificing accuracy.

Moving forward, the results open avenues for further enhancing adaptive tracking frameworks with more sophisticated modes of interaction between offline and online learning phases. Future explorations may delve into expanding the robustness of such models to handle an even broader array of real-world challenges, such as deformable object tracking and complex multitarget scenarios. Additionally, the emerging role of semi-supervised and unsupervised learning paradigms could be investigated to reduce reliance on extensive labeled datasets, aligning with the evolving landscape of machine learning and artificial intelligence in computer vision.

PDF Markdown

Related Papers

YouTube

Show All Videos