Target-Aware Deep Tracking (1904.01772v1)

Published 3 Apr 2019 in cs.CV

Abstract: Existing deep trackers mainly use convolutional neural networks pre-trained for generic object recognition task for representations. Despite demonstrated successes for numerous vision tasks, the contributions of using pre-trained deep features for visual tracking are not as significant as that for object recognition. The key issue is that in visual tracking the targets of interest can be arbitrary object class with arbitrary forms. As such, pre-trained deep features are less effective in modeling these targets of arbitrary forms for distinguishing them from the background. In this paper, we propose a novel scheme to learn target-aware features, which can better recognize the targets undergoing significant appearance variations than pre-trained deep features. To this end, we develop a regression loss and a ranking loss to guide the generation of target-active and scale-sensitive features. We identify the importance of each convolutional filter according to the back-propagated gradients and select the target-aware features based on activations for representing the targets. The target-aware features are integrated with a Siamese matching network for visual tracking. Extensive experimental results show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of accuracy and speed.

Citations (342)

View on Semantic Scholar

Summary

The paper presents a method that employs regression and ranking loss to extract target-aware features for improved tracking accuracy.
It integrates a Siamese matching network that achieves real-time tracking speeds of around 33.7 FPS on multiple benchmark datasets.
The approach bridges the gap between pre-trained CNN features and dynamic tracking needs, offering new avenues for robust visual tracking research.

An Analysis of "Target-Aware Deep Tracking"

The paper "Target-Aware Deep Tracking" addresses a fundamental challenge in visual tracking: the inadequate performance of pre-trained convolutional neural networks (CNNs) when applied to tracking tasks. Unlike static object recognition, visual tracking involves dynamic targets of varying forms, which necessitates adaptability in feature representation. While pre-trained deep features have proven effective in generic object recognition, their application to visual tracking doesn't adequately capture the distinctiveness of arbitrary target objects, thereby necessitating the development of a new approach described in this paper.

The authors propose a novel method for generating target-aware features that enhance the distinguishability of targets under significant appearance variations. This is achieved by employing a regression loss and a ranking loss to guide the extraction of target-active and scale-sensitive features from pre-trained CNNs. By evaluating the importance of each convolutional filter through the gradients obtained during back-propagation, the model strategically selects filters that are significant contributors to target representation, thereby improving tracking effectiveness.

An innovative aspect of this paper is the integration of target-aware features with a Siamese matching network, a technique that underlines the adaptability and computational efficiency of the approach. The proposed model notably narrows the gap between pre-trained networks and the requirements for visual tracking, overcoming limitations related to arbitrary target class forms.

Numerical Results and Claims

The experimental results across multiple benchmarks, including OTB-2013, OTB-2015, VOT-2015, VOT-2016, and Temple Color-128, show that the proposed method achieves competitive performance in accuracy, often surpassing state-of-the-art algorithms. Notably, the tracker operates at real-time speeds (around 33.7 FPS), demonstrating enhanced efficiency — a significant advantage given the computational demands of deep feature representation. These results position the proposed tracker as a formidable alternative in scenarios where computational efficiency is critical.

Implications and Future Directions

From a theoretical standpoint, the paper advances the understanding of feature selection in CNNs, particularly in specifying which features effectively contribute to the adaptability required in visual tracking. Practically, the insights gained from this method can be applied to improve real-time object tracking in various applications, including surveillance, autonomous navigation, and human-computer interaction systems.

For future research, there are several avenues to explore. Enhancing the scalability of target-aware features to accommodate more complex and dynamic environments could further expand the applicability of the approach. Additionally, integrating target-aware features with unsupervised learning paradigms may facilitate adaptation in scenarios with limited initial target information, thereby enhancing model robustness and generalization across diverse tracking landscapes.

In conclusion, the development of target-aware deep tracking presents an important evolution in visual tracking methodologies. By bridging the gap between pre-trained feature representations and the specific needs of dynamic target tracking, this research opens up new possibilities for real-time, efficient tracking solutions in increasingly complex scenarios.

PDF Markdown

Target-Aware Deep Tracking (1904.01772v1)

Summary

An Analysis of "Target-Aware Deep Tracking"

Numerical Results and Claims

Implications and Future Directions

Related Papers