Discriminative Scale Space Tracking (1609.06141v1)

Published 20 Sep 2016 in cs.CV

Abstract: Accurate scale estimation of a target is a challenging research problem in visual object tracking. Most state-of-the-art methods employ an exhaustive scale search to estimate the target size. The exhaustive search strategy is computationally expensive and struggles when encountered with large scale variations. This paper investigates the problem of accurate and robust scale estimation in a tracking-by-detection framework. We propose a novel scale adaptive tracking approach by learning separate discriminative correlation filters for translation and scale estimation. The explicit scale filter is learned online using the target appearance sampled at a set of different scales. Contrary to standard approaches, our method directly learns the appearance change induced by variations in the target scale. Additionally, we investigate strategies to reduce the computational cost of our approach. Extensive experiments are performed on the OTB and the VOT2014 datasets. Compared to the standard exhaustive scale search, our approach achieves a gain of 2.5% in average overlap precision on the OTB dataset. Additionally, our method is computationally efficient, operating at a 50% higher frame rate compared to the exhaustive scale search. Our method obtains the top rank in performance by outperforming 19 state-of-the-art trackers on OTB and 37 state-of-the-art trackers on VOT2014.

Citations (1,107)

View on Semantic Scholar

Summary

The paper presents a novel method employing separate discriminative filters for translation and scale estimation, significantly enhancing tracking robustness.
It reduces computational complexity and improves speed using adaptive scale learning and sub-grid interpolation to refine tracking precision.
The DSST method outperforms many state-of-the-art trackers on OTB and VOT2014, achieving higher overlap precision and increased frame rates.

Discriminative Scale Space Tracking

The paper "Discriminative Scale Space Tracking" by Martin Danelljan et al. investigates the challenging issue of accurate and reliable scale estimation in the domain of visual object tracking. The authors propose a novel approach by devising a scale adaptive tracking method that concurrently employs separate discriminative correlation filters for translation and scale estimation. This dual-filter approach not only addresses the inherent limitations of traditional exhaustive search methods in terms of computational expense and efficiency but also substantially improves the tracker’s responsiveness to significant scale variations.

Key Contributions

The authors present several pivotal contributions, primarily centering on the following aspects:

Discriminative Correlation Filter for Scale Estimation: The proposed method introduces a distinctive scale filter trained online using variations in the target’s appearance across different scales. This directly contrasts with standard methods relying on exhaustive search techniques.
Efficiency and Computational Performance: The scale filter is designed to reduce computational load by learning the appearance change induced by scale variations, thereby decreasing the dimensional search space. The resulting method achieves significant computational efficiency.
Extensive Evaluation: Comprehensive experiments conducted on the OTB and VOT2014 datasets demonstrate the method’s superior performance. The proposed tracker achieves a $2.5\%$ gain in average overlap precision on the OTB dataset and demonstrates a $50\%$ higher frame rate compared to exhaustive search methods.
Ranking and Robustness: The DSST method tops the rankings by outperforming 19 state-of-the-art trackers on the OTB dataset and 37 trackers on the VOT2014 dataset.

Detailed Overview

Baseline DCF Tracker

The standard discriminative correlation filter (DCF) based tracking method focuses primarily on translation estimation. This approach, while computationally efficient due to its implementation involving fast Fourier transforms (FFT), struggles significantly when subject to target scale variations.

Scale Estimation Strategies

Several scale estimation methods were explored, including:

Multi-resolution Translation Filter: This method involves evaluating the appearance model at multiple resolutions. Although effective, it is computationally intensive.
Joint Scale Space Filter: This strategy constructs a 3D correlation filter for simultaneous translation and scale estimation but suffers from computational inefficiency.
Iterative Joint Scale Space Filter: An extension of the joint filter, incorporating iterative updates, which further impacts computational time negatively.

Proposed DSST Method

The Discriminative Scale Space Tracker (DSST) diverges from the previous methods by learning two separate correlation filters for translation and scale estimation. The scale filter operates by first estimating target translation and subsequently adjusting the scale filter to refine the target’s size. This method reduces computational complexity and enhances frame rates.

The Fast Discriminative Scale Space Tracker (fDSST) extends this by employing:

Sub-grid Interpolation of Correlation Scores: Allows the use of coarser grids for training and detection, reducing FFT size and enhancing computational speed.
Dimensionality Reduction via PCA: The feature dimensionality is reduced without sacrificing tracking precision, aiding in processing efficiency.

Performance Analysis

OTB Dataset:

The DSST method yields an average overlap precision (OP) of $67.7\%$ and a distance precision (DP) of $75.7\%$ .
The fDSST further improves results with $74.3\%$ OP and $80.2\%$ DP, while doubling the tracking speed to $\approx54$ FPS.

VOT2014 Dataset:

The DSST method shows proficiency in both accuracy and robustness, with a reduced average failure rate (1.16).

Implications and Future Directions

The proposed DSST method clearly illustrates that accurate scale estimation significantly enhances tracking robustness, particularly in scenarios involving scale variations. The explicit learning of scale-induced appearance changes ensures computational efficiency and real-time operation, which is imperative for practical applications in robotics, surveillance, and automation.

Future work could involve integrating more advanced feature representations like color-based information (e.g., color names) or deep learning-based features, which may further enhance tracker performance in complex scenarios like deforming objects or varying illumination. Additionally, extending this framework to accommodate multi-object tracking in dense environments could be another promising direction, potentially leveraging joint optimization techniques for correlated object motions.

In conclusion, the paper presents a compelling approach to addressing scale estimation in visual tracking, setting a benchmark for future research in this domain with its detailed methodology and extensive empirical validation.