Distractor-aware Siamese Networks for Visual Object Tracking (1808.06048v1)

Published 18 Aug 2018 in cs.CV

Abstract: Recently, Siamese networks have drawn great attention in visual tracking community because of their balanced accuracy and speed. However, features used in most Siamese tracking approaches can only discriminate foreground from the non-semantic backgrounds. The semantic backgrounds are always considered as distractors, which hinders the robustness of Siamese trackers. In this paper, we focus on learning distractor-aware Siamese networks for accurate and long-term tracking. To this end, features used in traditional Siamese trackers are analyzed at first. We observe that the imbalanced distribution of training data makes the learned features less discriminative. During the off-line training phase, an effective sampling strategy is introduced to control this distribution and make the model focus on the semantic distractors. During inference, a novel distractor-aware module is designed to perform incremental learning, which can effectively transfer the general embedding to the current video domain. In addition, we extend the proposed approach for long-term tracking by introducing a simple yet effective local-to-global search region strategy. Extensive experiments on benchmarks show that our approach significantly outperforms the state-of-the-arts, yielding 9.6% relative gain in VOT2016 dataset and 35.9% relative gain in UAV20L dataset. The proposed tracker can perform at 160 FPS on short-term benchmarks and 110 FPS on long-term benchmarks.

Citations (1,136)

View on Semantic Scholar

Summary

The paper presents DaSiamRPN, a framework that enhances tracking through distractor-aware training with semantic negative pairs and targeted data augmentation.
It employs an online distractor suppression module to adapt to changing visual contexts, ensuring robust, real-time tracking performance.
Experimental results on benchmarks like VOT and UAV demonstrate significant gains in tracking accuracy and speed across short- and long-term scenarios.

An Analysis of Distractor-aware Siamese Networks for Visual Object Tracking

The paper "Distractor-aware Siamese Networks for Visual Object Tracking" presents a novel approach aimed at enhancing the robustness and accuracy of visual tracking systems by integrating distractor awareness into Siamese network frameworks. The work, authored by Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and Weiming Hu, addresses key issues in traditional Siamese trackers, such as sensitivity to semantic distractors and inability to adapt to significant appearance changes.

Overview of the Approach

The research introduces Distractor-aware Siamese Region Proposal Networks (DaSiamRPN), focusing on enhancing the discriminative power of the learned features during offline training and improving tracking robustness during online inference. The paper identifies that the imbalanced distribution of semantic and non-semantic backgrounds in training data contributes to the poor discriminative abilities of traditional models. To counter this, the authors implement strategies to include semantic negative pairs during training and customize data augmentation techniques, thereby enhancing the model's generalization ability.

Key Components

Distractor-aware Training:
- Balanced Data Distribution:
  
  Conventional training approaches for Siamese networks often result in an overrepresentation of non-semantic background samples, leading to less discriminative feature learning. The proposed method actively generates diverse semantic pairs, including both intra-category distractors and inter-category negatives, to address this imbalance.
- Customized Data Augmentation:
  
  The paper introduces targeted data augmentation strategies, such as the introduction of motion blur and other variations that mimic real-world conditions, which are vital for tracking.
Distractor-aware Incremental Learning:
- Online Adaptation:
  
  During inference, a novel distractor-aware module adapts the general embedding to the specific video domain. This module suppresses distractors by incorporating the surrounding context and temporal information, facilitating the transfer of learned representations to the current video domain incrementally.
Long-term Tracking Capabilities:
- Local-to-global Search Strategy:
  
  For long-term tracking scenarios where targets may undergo full occlusions or leave the field of view, the model employs a multi-stage search paradigm. Initially, a local search is conducted, and in failure cases, a broader search area is iteratively expanded to re-detect the target.

Experimental Results and Analysis

The efficacy of DaSiamRPN is validated through extensive evaluations on multiple benchmarks: VOT2016, VOT2017, OTB2015, UAV20L, and UAV123. The results reveal significant performance improvements compared to state-of-the-art methods.

VOT2016 and VOT2017:
- DaSiamRPN achieves a 9.6% relative gain in Expected Average Overlap (EAO) on VOT2016 and outperforms other methods on VOT2017 with an EAO of 0.326.
- The real-time performance of DaSiamRPN also stands out, with speeds of 160 FPS on short-term benchmarks and 110 FPS on long-term benchmarks.
UAV20L and UAV123:
- On UAV20L, DaSiamRPN records an AUC score of 0.617, outperforming other trackers by a significant margin. The robustness to full occlusion and background clutter is particularly noteworthy.
- UAV123 results also indicate superior precision and success rates, surpassing existing models in handling fast motion and low resolution challenges.

Implications and Future Directions

The proposed distractor-aware module and data strategies in DaSiamRPN mark a substantial advancement in visual tracking. The ability to handle distractors effectively and adapt to new domains incrementally provides a robust solution for both short-term and long-term tracking applications.

Practical Implications:

Surveillance and Security Applications: Enhanced robustness in tracking under varying conditions makes DaSiamRPN suitable for real-time surveillance systems.
Autonomous Vehicles: Improved long-term tracking capabilities ensure better performance in dynamic environments with occlusions and fast-moving objects.
Robotics: The ability to discern between targets and distractors can significantly enhance robot vision systems, particularly in dynamic and cluttered environments.

Theoretical Contributions:

Distractor-aware Incremental Learning: The paper introduces an efficient mechanism to incorporate negative contextual information during online tracking, which can be broadly applied in other similarity-based tasks.
Balanced Training via Semantic Pairs: By addressing the imbalance in training data, this work opens avenues for further research on data augmentation strategies tailored for visual tracking.

Conclusion

In conclusion, "Distractor-aware Siamese Networks for Visual Object Tracking" presents a comprehensive solution to the prevalent issues in traditional Siamese trackers. The combination of balanced training data, distractor-aware features, and adaptive long-term tracking strategies results in a highly efficient and accurate tracking system. Future research may explore optimizing the proposed frameworks and exploring additional applications of distractor-aware concepts in other domains of AI and computer vision.

PDF Markdown