Learning Target Candidate Association to Keep Track of What Not to Track (2103.16556v2)

Published 30 Mar 2021 in cs.CV

Abstract: The presence of objects that are confusingly similar to the tracked target, poses a fundamental challenge in appearance-based visual tracking. Such distractor objects are easily misclassified as the target itself, leading to eventual tracking failure. While most methods strive to suppress distractors through more powerful appearance models, we take an alternative approach. We propose to keep track of distractor objects in order to continue tracking the target. To this end, we introduce a learned association network, allowing us to propagate the identities of all target candidates from frame-to-frame. To tackle the problem of lacking ground-truth correspondences between distractor objects in visual tracking, we propose a training strategy that combines partial annotations with self-supervision. We conduct comprehensive experimental validation and analysis of our approach on several challenging datasets. Our tracker sets a new state-of-the-art on six benchmarks, achieving an AUC score of 67.1% on LaSOT and a +5.8% absolute gain on the OxUvA long-term dataset.

Citations (205)

View on Semantic Scholar

Summary

The paper's main contribution is a novel association network that propagates candidate identities across frames to distinguish targets from similar distractors.
It employs a hybrid training strategy combining partial supervision with self-supervision to overcome the lack of complete annotations for distractors.
Experiments on six datasets, including LaSOT and OxUvA, demonstrate state-of-the-art performance with significant improvements in AUC metrics.

Overview of "Learning Target Candidate Association to Keep Track of What Not to Track"

The paper "Learning Target Candidate Association to Keep Track of What Not to Track" addresses a significant challenge in appearance-based visual object tracking: the presence of distractor objects that are visually similar to the target. These distractor objects are typically misclassified as the target, leading to frequent tracking failures. Prior solutions have focused on enhancing the discriminative power of appearance models to suppress distractors, but this paper proposes an alternative method that involves actively tracking distractor objects to maintain the integrity of target tracking.

The authors introduce a learned association network to propagate the identities of all target candidate objects from frame to frame, effectively distinguishing between the target and similar distractors. Due to the lack of ground-truth annotations for distractor objects across frames, the paper presents a novel training strategy combining partial annotations with self-supervision to enable effective distractor identification and association.

Methodology

The primary innovation in this work is the target candidate association network, which pairs with a base appearance tracker to extract candidate objects for tracking. Each candidate is characterized by a set of features, including the target classifier score, spatial position, and appearance-based characteristics derived from backbone features. These features are encoded into embeddings processed by a graph-based candidate embedding network, which computes association scores crucial for tracking the target and distractor objects over time.

To manage the challenges in learning associations due to incomplete annotations, the approach involves partial supervision with existing target annotations and a self-supervised learning strategy to synthesize ground-truth matches for distractors. Furthermore, the network is trained to handle rare and challenging cases detected during a base tracker's operation by actively mining these examples from the training data.

Results

The paper reports that the proposed tracker, termed KeepTrack, establishes new state-of-the-art results across six datasets by comprehensively outperforming the existing methods on tracking benchmarks like LaSOT and OxUvA, with substantial improvements reflected in metrics such as AUC scores. In particular, KeepTrack achieves an AUC of 67.1% on the LaSOT dataset, marking a significant advance with a 5.8% gain on the challenging OxUvA long-term dataset.

Implications and Future Directions

From a practical standpoint, this research indicates that actively handling distractor objects can enhance long-term visual tracking reliability, particularly in environments with a high density of similar distractors. The proposed methodological shift lessens the reliance on improving the discriminative power of base appearance models, suggesting a paradigm in which tracking methods are more resilient to adverse conditions and distractions.

Theoretically, this work suggests that dynamic context awareness and association strategies might significantly influence future advancements in tracking systems. These strategies may be further refined through deeper integration with scene understanding models and potentially incorporating additional cues like motion or temporal patterns for more comprehensive target-distractor differentiation.

Future developments in this domain could explore extending the application of association networks to multi-object tracking scenarios, where a more varied set of distractors is present. Additionally, enhancements in self-supervised training techniques could provide more robust frameworks for learning without extensive labeled data, further applicability in dynamic and varied real-world scenarios.

PDF Markdown

Related Papers

Tweets

https://twitter.com/MDanelljan/status/1438071666203058179