Learning the Model Update for Siamese Trackers (1908.00855v2)

Published 2 Aug 2019 in cs.CV

Abstract: Siamese approaches address the visual tracking problem by extracting an appearance template from the current frame, which is used to localize the target in the next frame. In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time. While such an approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. Therefore, we propose to replace the handcrafted update function with a method which learns to update. We use a convolutional neural network, called UpdateNet, which given the initial template, the accumulated template and the template of the current frame aims to estimate the optimal template for the next frame. The UpdateNet is compact and can easily be integrated into existing Siamese trackers. We demonstrate the generality of the proposed approach by applying it to two Siamese trackers, SiamFC and DaSiamRPN. Extensive experiments on VOT2016, VOT2018, LaSOT, and TrackingNet datasets demonstrate that our UpdateNet effectively predicts the new target template, outperforming the standard linear update. On the large-scale TrackingNet dataset, our UpdateNet improves the results of DaSiamRPN with an absolute gain of 3.9% in terms of success score.

Citations (293)

View on Semantic Scholar

Summary

The paper proposes UpdateNet, a CNN-based model that learns dynamic update strategies for Siamese trackers, overcoming the limitations of linear updates.
It integrates a skip connection from the initial template to prevent drift and adapts to rapid appearance changes with a context-aware approach.
Empirical results show a 3.9% gain in success score on TrackingNet, demonstrating significant improvements in visual tracking performance.

An Analysis of Model Update Mechanisms for Siamese Trackers

The paper "Learning the Model Update for Siamese Trackers" presents a comprehensive exploration of updating strategies in the context of Siamese-based visual object tracking. The authors identify substantial limitations in the prevalent linear update mechanism, utilized in many state-of-the-art trackers, and propose a novel approach with the introduction of UpdateNet—a convolutional neural network designed to learn and predict the optimal model update dynamically.

Current Challenges

Siamese trackers typically solve the problem of visual tracking by matching an object appearance template across consecutive frames. The conventional linear update mechanism updates the appearance model using a simple running average, which has several inherent limitations. Key amongst these is the assumption of a constant rate of appearance change, which fails to account for the evolving and often context-dependent nature of real-world scenes. This simplification can lead to inadequate performance in scenarios characterized by rapid appearance changes, occlusion, or background clutter. Moreover, the uniform update rate across spatial and channel dimensions precludes localized response to changes, which can be particularly detrimental in handling partial occlusions.

Proposed Solution: UpdateNet

The authors address these challenges by proposing an adaptive approach grounded in machine learning. The UpdateNet model replaces the traditional linear update process with a more nuanced update strategy that is learned from the data itself. The network uses three key inputs: the initial ground-truth template, an accumulated template from previous frames, and the current frame template. This enables UpdateNet to produce an accumulated template that is context-aware and capable of responding to individual frame characteristics and variations.

Moreover, UpdateNet incorporates a residual learning strategy by adding a skip connection from the initial template, ensuring robustness against drift—a common issue when over-relying on recent frames. This architecture is computationally light, allowing for integration into existing Siamese tracking frameworks without a significant efficiency trade-off.

Empirical Validation

Through extensive experimentation, UpdateNet has shown considerable improvements over traditional linear update mechanisms. Evaluations undertaken on robust benchmark datasets, including VOT2016, VOT2018, LaSOT, and TrackingNet, reveal that UpdateNet significantly enhances tracking performance in various scenarios. Notably, UpdateNet achieved an absolute gain of 3.9% in success score over DaSiamRPN on the TrackingNet dataset, demonstrating its effective handling of appearance changes.

Implications and Future Directions

The work presented underscores the importance of adaptability in tracking algorithms. By leveraging machine learning in the update process, the presented approach offers more precise adaptation for varying scenarios, contributing to both theoretical and practical advancements in the domain of visual object tracking.

Potential future avenues of research suggested by this work include exploring more sophisticated neural architectures that can further capture the nuanced interactions between templates from disparate frames. Additionally, the application of UpdateNet principles to other forms of visual trackers—beyond Siamese frameworks—may also yield beneficial adaptations for more robust tracking across different domains.

In conclusion, the paper provides a significant contribution to the field of visual tracking by offering a more dynamic and adaptive model update method—outperforming conventional models and yielding practical benefits across various tracking challenges. Such approaches introduce meaningful advancements in the way we think about model adaptation in computer vision applications.

PDF Markdown